Closed thalassemia closed 1 year ago
This is a fantastic optimization! I'm kind of sad we lose the ability to use keys into the state to retrieve values, but it is definitely worth this runtime improvement. Numpy everywhere! Regarding points 2 and 3, I think the performance gains make this worth it, so I am in support. Please comment the reasoning behind that UniqueNumpyUpdater
very well so future people can follow understand why this is required.
Changes
_entryState
that marks whether a given row contains an active unique molecule._entryState
column. Then, it can add new molecules in those inactive rows instead of growing the array. One downside to this approach is that updates that modify unique molecule attributes must occur before those that add or delete molecules. For example, if you first deleted a molecule and then added a new molecule in that row, a subsequent update trying to modify that row would have no idea that the molecule it was expecting at that position no longer exists, resulting in the improper modification of a completely unrelated molecule. wcEcoli gets around this by accumulating updates until all evolvers have run, then sorts them before applying them. To mimic this functionality, I made a hacky updater that is actually a method of a class, allowing me to accumulate updates using instance variables. https://github.com/CovertLab/vivarium-ecoli/blob/5e29fc7c777dea1b747ed9425319d379f4abd2fa/ecoli/library/schema.py#L188-L240 Then I added a Step calledUniqueUpdate
that runs after all the processes but before any other Steps. This Step signals to each unique molecule updater that it is time to apply all the accumulated updates and clear the instance variables. This gets complicated with cases likeChromosomeStructure
, a Step that sends updates to unique molecule stores. For now, I’ve added some boilerplate inecoli_master
to automatically add aUniqueUpdate
Step after each other Step in the composite (right now we technically only need one more afterChromosomeStructure
, but I figured this solution is more futureproof and with minimal performance impact). https://github.com/CovertLab/vivarium-ecoli/blob/5e29fc7c777dea1b747ed9425319d379f4abd2fa/ecoli/composites/ecoli_master.py#L244-L250EngineProcess
to get initial state for inner simulation by callinginitial_state
method of theComposite
generated by callinggenerate
onself.parameters['inner_composer']
CellDivision
process. Fixes issue in colony simulations where threshold was calculated using the dry mass at('listeners', 'dry_mass')
, which is not 100% accurate immediately after division (beforeMassListener
has run). This entailed adding a newDivisionDetector
Step to the inner simulation which sets a flag tellingEngineProcess
it is time to divide.build_ecoli
andrun
methods ofEcoliSim
. Users should now callbuild_ecoli
before callingrun
(more flexibility to modify composite before running).Clock
process forMassListener
to calculateexpected_mass_fold_change
Results
Runtime down to <10 minutes per cell cycle and simulation results perfectly match wcEcoli branch commit c4261d97 (compare to #174). Notably, while this matches the runtime of the reference wcEcoli commit, vivarium-ecoli becomes noticeably (~35%) faster when simulation results are stored in RAM instead of being emitted to MongoDB on disk at every timestep. Further optimization may be possible by using bulk insert operations (
insert_many
instead ofinsert_one
).Todo