Separate task preprocessing from simulation execution

jonrkarr commented 3 years ago

[x] Refactor simulators
- [x] AMICI
- [x] BioNetGen
- [x] BoolNet
- [x] CBMpy
- [x] COBRApy
- [x] COPASI
- [x] GillesPy2
- [x] GINsim
- [x] libSBMLSim
- [x] MASSpy
- [x] OpenCOR
- [x] pyNeuroML/NEURON/NetPyNE
- [x] PySCeS
- [x] RBApy
- [x] Smoldyn
- [x] tellurium
- [x] XPP
[x] Update integrated BioSimulators pipenv and Docker image
[x] Update pipenv for BioSimulations combine-service used for low-latency online simulation

Notes on limitations

preprocess_sed_task should be re-run if any of these conditions are met
- model structure must be changed (e.g., additional species, reactions)
- simulation algorithm or algorithm parameters changed
- additional attributes (parameters, initial conditions) need to be changed -- its best to outline all attributes that might need to be changed upon the initial call of preprocess_sed_task
- additional variables need to be recorded -- its best to outline all variables that might need to be recorded upon the initial call of preprocess_sed_task
Because some simulator representations of models diverge from their associated model languages, some changes that can be applied to model specifications cannot easily be applied to in-memory simulation representations of model
- The SBML-fbc representation of FBA models diverges a little from how simulation tools represent models. In particular, SBML-fbc uses a small number of parameters to represent flux bounds. In contrast, simulation tools flatten this out to separate parameters for each upper and lower bound of each reaction. These low-dimensional parameters can be changed at the model specification (XML) level, but are difficult to change at the simulator level because simulators don't retain knowledge of these parameters. Due to this divergence, we support two different mechanisms for changing FBA models
- exec_sed_task: supports model changes on the simulator representation of models. This should work well for Vivarium. Presently, this is limited to changing flux bounds.
- Execution of SED-ML files and COMBINE archives: supports model changes on the XML representation of models. This supports the full set of possible changes: change attributes and add/remove/replace XML nodes
- The Smoldyn software also diverges from Smoldyn simulation configurations. For example, the Smoldyn software does not retain information about parameter values.
- As a result, parameters can only be edited during task preprocessing when simulation configuration files are read
- In contrast, molecule counts can be set repeated as part of task execution
Some simulation tools don't represent or provide ways to set initial levels
- BoolNet: appears to only hold initial levels for constant species
- GINsim: See GINsim/GINsim-python#19
For some simulation tools, repeated executions of exec_sed_task require re-parsing models
- BioNetGen: primarily a command-line tool implemented in Perl; py-perl5 could maybe be used to improve the connection to the Perl program; see RuleWorld/PyBioNetGen#22
- LibSBMLSim: doesn't expose a method for parsing models; see libsbmlsim/libsbmlsim#23
- XPP: only available as a binary executable
- pyNeuroML: actual simulator is implemented in Java below multiple layers of Python and Java packages. Model files are passed down through these layers. Could be improved with a Python-Java bridge, but would take some of work.

jonrkarr commented 3 years ago

@eagmon, the progress on factoring out unnecessary computations for repeated execution is summarized above.

The preprocessed information is sufficient to change values of parameters and initial conditions. Presently, more substantial changes such as adding/removing/replacing species/reactions would require re-preprocessing models.

For SBML and CellML, this follows their SED-ML conventions of using XML XPaths to address model components. Once this refactoring is done, we can work on a second, simpler way of addressing model components by their SBML/CellML ids. At least to start, this would be restricted to changing values of parameters and initial conditions. Adding/removing/replacing components would only be supported at the XML level where there's already a convention for describing such changes.

eagmon commented 3 years ago

@jonrkarr -- Looks like good progress. I know from our work on biosimulators-tellurium that we used exec_sed_task and preprocess_sed_task methods -- are these same methods available for all simulators with ✅ ? I know biosimulators-cobrapy did not previously have those module attributes.

jonrkarr commented 3 years ago

Until recently, each simulator API had 1 method exec_sed_task. Each API now has two methods

exec_sed_task
preprocess_sed_task

preprocess_sed_task returns a data structure which essentially represents parsed models and a map between our standard representation of models and simulations (SED-ML/KiSAO) and each simulator's internal representation. This data structure is unique to each simulation tool.

exec_sed_task has an optional argument preprocessed_task for this preprocessed information. If the argument isn't provided, then exec_sed_task has to build this map. Providing this argument avoids any computation common to multiple repeated executions of a single model (typically with different parameters and/or initial conditions).

I've implemented and pushed half of the preprocess_sed_task methods. The others are still just skeletons. I'm hoping to finish that in the next few days.

For constraint-based simulations, there's opportunity to go further to hot start optimizations with some solvers such as CPLEX and Gurobi. This would require changes to the FBA packages, COBRApy and CBMpy.

jonrkarr commented 3 years ago

The updated Docker image is released. The entrypoint now opens an iPython shell to the Pipenv environment with all of the simulation tools.

docker pull ghcr.io/biosimulators/biosimulators:0.0.2
docker run -it --rm ghcr.io/biosimulators/biosimulators:0.0.2

The only two standardized tools that aren't included are

OpenCOR: Installation is complicated and requires Python 3.7. This group is working toward a more composable simulation library which the core simulation functionality separated from the GUI.
VCell: No Python API available. The developers are thinking about creating a Python API. They have an old API that could be a good starting point.

The updated simulation tools are deployed on the main RunBioSimulations simulation service. They will be updated soon on the low latency/low performance service.

More documentation (e.g., Jupyter notebook) is still coming.

biosimulators / Biosimulators

Separate task preprocessing from simulation execution #399

Notes on limitations