biosimulators / Biosimulators

Registry of containerized biosimulation tools that support a standard command-line interface
https://biosimulators.org
MIT License
13 stars 0 forks source link

Separate task preprocessing from simulation execution #399

Closed jonrkarr closed 3 years ago

jonrkarr commented 3 years ago

Notes on limitations

jonrkarr commented 3 years ago

@eagmon, the progress on factoring out unnecessary computations for repeated execution is summarized above.

The preprocessed information is sufficient to change values of parameters and initial conditions. Presently, more substantial changes such as adding/removing/replacing species/reactions would require re-preprocessing models.

For SBML and CellML, this follows their SED-ML conventions of using XML XPaths to address model components. Once this refactoring is done, we can work on a second, simpler way of addressing model components by their SBML/CellML ids. At least to start, this would be restricted to changing values of parameters and initial conditions. Adding/removing/replacing components would only be supported at the XML level where there's already a convention for describing such changes.

eagmon commented 3 years ago

@jonrkarr -- Looks like good progress. I know from our work on biosimulators-tellurium that we used exec_sed_task and preprocess_sed_task methods -- are these same methods available for all simulators with ✅ ? I know biosimulators-cobrapy did not previously have those module attributes.

jonrkarr commented 3 years ago

Until recently, each simulator API had 1 method exec_sed_task. Each API now has two methods

preprocess_sed_task returns a data structure which essentially represents parsed models and a map between our standard representation of models and simulations (SED-ML/KiSAO) and each simulator's internal representation. This data structure is unique to each simulation tool.

exec_sed_task has an optional argument preprocessed_task for this preprocessed information. If the argument isn't provided, then exec_sed_task has to build this map. Providing this argument avoids any computation common to multiple repeated executions of a single model (typically with different parameters and/or initial conditions).

I've implemented and pushed half of the preprocess_sed_task methods. The others are still just skeletons. I'm hoping to finish that in the next few days.

For constraint-based simulations, there's opportunity to go further to hot start optimizations with some solvers such as CPLEX and Gurobi. This would require changes to the FBA packages, COBRApy and CBMpy.

jonrkarr commented 3 years ago

The updated Docker image is released. The entrypoint now opens an iPython shell to the Pipenv environment with all of the simulation tools.

docker pull ghcr.io/biosimulators/biosimulators:0.0.2
docker run -it --rm ghcr.io/biosimulators/biosimulators:0.0.2

The only two standardized tools that aren't included are

The updated simulation tools are deployed on the main RunBioSimulations simulation service. They will be updated soon on the low latency/low performance service.

More documentation (e.g., Jupyter notebook) is still coming.