CovertLab / wcEcoli

Whole Cell Model of E. coli
Other
18 stars 4 forks source link

Time-step reversion (for unique molecules) #151

Open jmason42 opened 6 years ago

jmason42 commented 6 years ago

Towards the end of dynamic time steps, it will be necessary to detect when simulation steps have failed (e.g. does not meet some minimal accuracy criterion), and if so, revert the state of the system back to the previous step. A few things make this challenging:

The latter was done by myself as a lazy optimization - if processes directly modified their allocated portion of the state, we wouldn't need to worry about the overhead associated with updating and maintaining the very large UniqueObjectsContainer object. However, reversion requires that we have some way to recover the old state as needed. I'd appreciate @1fish2's ideas on how to approach this, but I have three of my own:

1fish2 commented 6 years ago

How UniqueObjectsContainer works and its usage pattern are not obvious to me. It looks vaguely like a Python-implemented dict of ndarrays. (If it's frequently accessed and can't do the heavy lifting via built-in types, it could be worth writing a Cython extension type.) We can discuss it before Thursday's WC team meeting.

Guessing on that part, my going-in thought is the approach that keeps previous + next instances of UniqueObjectsContainer (or of its guts) in memory: snapshot the current state, modify one of the two, then accept one by making a fresh copy of the other or else by copying in the other's contents, whichever turns out to be faster. This isolates the revert feature in code that's easy to test and could be implemented with an optimized loop.

If the ndarrays get replaced rather than modified, the collection could be snapshotted via a fast shallow copy like a_dict.copy().

Tracking diffs is viable but would be more complicated. That's OK if all the changes are done via a small number of methods, and it could save a lot of memory, but if code outside this class has to help with diff-tracking then it'd be a "leaky abstraction," hard to get right, and hard to maintain. If memory is a problem then I'd first look at making the collection more compact, e.g. _entryState is an np.int64 but maybe only holds a boolean?

jmason42 commented 6 years ago

You're correct, it's a dictionary of ndarrays. More specifically, the ndarrays are 'structured' arrays, and the arrays themselves are resized to accommodate more elements (I don't remember the exact pattern, but it tries to expand in blocks rather than an element at a time so that it's not constantly creating huge new arrays). I'm not sure how much value Cython would add, as the frequently used operations should all be vectorized.

I think you're right about the 'diff' approach, I could see it breaking down if the interface isn't really tight. The operations should be pretty limited and manageable but it does mean checking that no part of the code is sneaking into the class's private attributes. Copying will probably be the way to go.

prismofeverything commented 5 years ago

I have a plan for this that involves encoding updates as "messages" (much like your "diffs" above), then the list of messages can be processed to resolve any conflicts and finally applied once they pass through the resolver. Resolution can be a series of independent custom "filters" that basically take the list of updates and emit a new list of (possibly) altered updates based on each resolver's own internal logic.

Once this is complete then we will be able to access the container in "read-only" mode from any process and we can shed the cumbersome request structure and avoid ever mutating the unique objects in place.

jmason42 commented 5 years ago

Hopefully the resolutions can be largely generic rather than case specific. Trying to think of some useful test cases, here's what I got:

Minor note: if you're thinking about applying updates as resolved diffs, you could also consider saving those diffs instead of saving the container each time step. Presumably it would use less disk space (so long as most unique molecules live for at least a few time steps).

prismofeverything commented 5 years ago

@jmason42 Great considerations, thank you. Yes I went through a similar exercise in my mind and most conflicts resolve easily. In your case with simultaneous alteration and deletion, I assert we either find a copacetic resolution (which I think is possible in this case, we would have to explore the details) or in the extreme case where it can't be resolved we discard all the updates and trigger the recomputation of a shorter time step (as described in https://github.com/CovertLab/wcEcoli/issues/401). This is what we would do in the case of negative molecule counts in bulk molecules which is basically the same situation: a state that cannot be resolved given the current set of updates.

jmason42 commented 5 years ago

It's a bit harder than making the time step shorter, due to the biased dice re-rolling metaphor we discussed a few days back. But at its heart, the idea is correct. Ideally it would work something like this: 1) The two processes calculate their updates for the full time step.

This requires a little more information (when did the deletion occur?) and some infrastructure that we don't currently yet have.

prismofeverything commented 5 years ago

Just verified this function never gets called (at least under normal conditions for a single generation), which simplifies things a bit: https://github.com/CovertLab/wcEcoli/blob/master/wholecell/containers/unique_objects_container.py#L797