Modularize fitter runs and allow caching of sim_data components

CovertLab / wcEcoli

Whole Cell Model of E. coli

Other

18 stars 4 forks source link

Modularize fitter runs and allow caching of sim_data components #413

Open prismofeverything opened 5 years ago

prismofeverything commented 5 years ago

There has been some discussion here about cutting down the time required to run the fitter. Depending on what you are working on, some people are required to run the fitter over and over again in order to develop features, even though most of the output will be the same (only the feature you are working on will really change). This can add up to a large time sink.

The proposal is to provide the ability to run only facets of the fitter, reusing an existing fitter output and updating only the subtree that has been recomputed. So it could be invoked like so:

python runscripts/manual/runFitter.py --facet translation

or what have you. This would be linked to a particular function in the fitter which would be invoked and the existing sim_data object updated and repickled.

I don't know how urgent this is, but anything that cuts down on how long it takes to iterate while developing features will be beneficial to the overall development process.

1fish2 commented 5 years ago

Sounds great.

One idea: Add a test for the facet mechanism to verify that the output after an incremental change matches the output of a fresh run, and be sure to run the test after iterating on any particular module to gain confidence in the results.

tahorst commented 5 years ago

Although the idea is great, it does seem a little infeasible. Everything in the cell is related to each other so changes to translation will affect transcription, small molecules, degradation, regulation etc. The fitter is iterative so each of these individual aspects will get updated and affect the others during each iteration so it's not apparent to me how any of them could be cached (other than what is already cached or things that are not computationally intensive in the first place). It would require a major rework of the fitter to incorporate this if it is even possible to disentangle the different facets from each other.

Do you have a specific instance of something that you think could be cached? The only possibility I could think of is caching certain conditions and only performing the calculations if we know something has changed that would affect them. It still seems unlikely that we would be able to determine this ahead of time though.

prismofeverything commented 5 years ago

Good point, to achieve this we would have to encode a dependency tree of which elements of the fitter depend on which others and expire dependents as well. I agree that for elements that are reused over and over again this wouldn't be feasible (you would just end up running the whole fitter again), but there are a number of other areas where these changes are independent, ie most things downstream of transcription/translation.

I have talked to @heejochoi about drawing out a kind of dependency graph of the fitter and I think this would give a good indication of which subsets of sim_data would be feasible to run independently. My cursory examination revealed that though certain values are reused over and over again, much of the derived values are only used in the simulation.

This came up talking to both @heejochoi and @eagmon independently, who are both working in the fitter in addition to other things.

prismofeverything commented 5 years ago

It would require a major rework of the fitter to incorporate this if it is even possible to disentangle the different facets from each other.

Yeah, definitely not going to happen immediately, but there are already plans to rework the fitter based on a dependency network and if/when that occurs this feature would be straightforward to implement.