Open jmason42 opened 6 years ago
How UniqueObjectsContainer
works and its usage pattern are not obvious to me. It looks vaguely like a Python-implemented dict of ndarrays. (If it's frequently accessed and can't do the heavy lifting via built-in types, it could be worth writing a Cython extension type.) We can discuss it before Thursday's WC team meeting.
Guessing on that part, my going-in thought is the approach that keeps previous + next instances of UniqueObjectsContainer
(or of its guts) in memory: snapshot the current state, modify one of the two, then accept one by making a fresh copy of the other or else by copying in the other's contents, whichever turns out to be faster. This isolates the revert feature in code that's easy to test and could be implemented with an optimized loop.
If the ndarrays get replaced rather than modified, the collection could be snapshotted via a fast shallow copy like a_dict.copy()
.
Tracking diffs is viable but would be more complicated. That's OK if all the changes are done via a small number of methods, and it could save a lot of memory, but if code outside this class has to help with diff-tracking then it'd be a "leaky abstraction," hard to get right, and hard to maintain. If memory is a problem then I'd first look at making the collection more compact, e.g. _entryState
is an np.int64
but maybe only holds a boolean?
You're correct, it's a dictionary of ndarrays. More specifically, the ndarrays are 'structured' arrays, and the arrays themselves are resized to accommodate more elements (I don't remember the exact pattern, but it tries to expand in blocks rather than an element at a time so that it's not constantly creating huge new arrays). I'm not sure how much value Cython would add, as the frequently used operations should all be vectorized.
I think you're right about the 'diff' approach, I could see it breaking down if the interface isn't really tight. The operations should be pretty limited and manageable but it does mean checking that no part of the code is sneaking into the class's private attributes. Copying will probably be the way to go.
I have a plan for this that involves encoding updates as "messages" (much like your "diffs" above), then the list of messages can be processed to resolve any conflicts and finally applied once they pass through the resolver. Resolution can be a series of independent custom "filters" that basically take the list of updates and emit a new list of (possibly) altered updates based on each resolver's own internal logic.
Once this is complete then we will be able to access the container in "read-only" mode from any process and we can shed the cumbersome request structure and avoid ever mutating the unique objects in place.
Hopefully the resolutions can be largely generic rather than case specific. Trying to think of some useful test cases, here's what I got:
Minor note: if you're thinking about applying updates as resolved diffs, you could also consider saving those diffs instead of saving the container each time step. Presumably it would use less disk space (so long as most unique molecules live for at least a few time steps).
@jmason42 Great considerations, thank you. Yes I went through a similar exercise in my mind and most conflicts resolve easily. In your case with simultaneous alteration and deletion, I assert we either find a copacetic resolution (which I think is possible in this case, we would have to explore the details) or in the extreme case where it can't be resolved we discard all the updates and trigger the recomputation of a shorter time step (as described in https://github.com/CovertLab/wcEcoli/issues/401). This is what we would do in the case of negative molecule counts in bulk molecules which is basically the same situation: a state that cannot be resolved given the current set of updates.
It's a bit harder than making the time step shorter, due to the biased dice re-rolling metaphor we discussed a few days back. But at its heart, the idea is correct. Ideally it would work something like this: 1) The two processes calculate their updates for the full time step.
This requires a little more information (when did the deletion occur?) and some infrastructure that we don't currently yet have.
Just verified this function never gets called (at least under normal conditions for a single generation), which simplifies things a bit: https://github.com/CovertLab/wcEcoli/blob/master/wholecell/containers/unique_objects_container.py#L797
Towards the end of dynamic time steps, it will be necessary to detect when simulation steps have failed (e.g. does not meet some minimal accuracy criterion), and if so, revert the state of the system back to the previous step. A few things make this challenging:
BulkMolecules
state does not finalize a step until its.merge
method is called, the underlyingUniqueMolecules
object is directly modified by processes.The latter was done by myself as a lazy optimization - if processes directly modified their allocated portion of the state, we wouldn't need to worry about the overhead associated with updating and maintaining the very large
UniqueObjectsContainer
object. However, reversion requires that we have some way to recover the old state as needed. I'd appreciate @1fish2's ideas on how to approach this, but I have three of my own:UniqueObjectsContainer
s in memory. One will be the previous state, and the other will be the (to be) modified state. If the step is accepted, we copy the modified state into the previous state, and vice-versa if the step is rejected. This is a great short-term solution, but I could see this having a lot of overhead (memory and CPU time). If we're careful we may be able to write one container into another's memory, which could be more efficient (less garbage collection from discarding old containers).UniqueObjectsContainer
objects, but track updates as little 'diffs'. These 'diffs' can then be undone as needed. I like this idea because it keeps the usual, desired behavior (accept and apply) fast, at the cost of the hopefully exceptional behavior (reject and undo) being the slower, harder operation.UniqueObjectsContainer
to disk because it's so large (and not useful for analysis).