Model the environment as a simulation State object

jmason42 commented 6 years ago

Right now we have several approaches for modeling environmental shifts. As the types of shifts and number of processes grow, this is going to be hard to maintain. I think modeling the environment as a new type of state is a natural choice. It would standardize environmental interaction across processes (up to the limit of process-specific code). Uniquely, this state would be mostly static (apart from environmental shifts). This would also be a natural point for interaction with an agent-based model. I can imagine some other advantages and utilities, but that's the gist.

eagmon commented 6 years ago

I agree on adding the environment as a state, which should be updated with every time step, and integrated with the other processes. We could also add perturbations directly to this object.

Should this be started as a new process? environment.py?

On Sun, May 6, 2018 at 5:52 PM, jmason42 notifications@github.com wrote:

Right now we have several approaches for modeling environmental shifts. As the types of shifts and number of processes grow, this is going to be hard to maintain. I think modeling the environment as a new type of state is a natural choice. It would standardize environmental interaction across processes (up to the limit of process-specific code). Uniquely, this state would be mostly static (apart from environmental shifts). This would also be a natural point for interaction with an agent-based model. I can imagine some other advantages and utilities, but that's the gist.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CovertLab/wcEcoli/issues/146, or mute the thread https://github.com/notifications/unsubscribe-auth/AGfnV4bViWx2fTAg8UO3s5lYuZciN1csks5tv5rggaJpZM4T0RPt .

-- Eran Agmon, PhD Department of Bioengineering, Stanford University http://eagmon.tumblr.com/

eagmon commented 6 years ago

re-reading/thinking about this now. The environmental state might not need a designated process at first. We would just have to update the state through the individual processes -- perhaps just metabolism, which will pull/excrete nutrients from the environment?

tahorst commented 6 years ago

This is basically the same conversation as in #127. I think we need to move the environment from the polypeptide_elongation process. Listeners seem best suited to only write data. Creating a new state object would probably be best. We would have to think about how to implement updates to it because unlike molecule counts (bulk/unique molecules), it's not a simple reduction function but a discrete shift. This shouldn't be too difficult but would require a couple new functions/new type of state variable.

jmason42 commented 6 years ago

@eagmon Yes, I don't think a process is needed right now, and I don't think a process strictly tied to the environment is correct, either. Processes tightly coupled to states are fragile; this was a flaw in the M. gen model. In the short term, I would avoid updating the environment entirely, under the assumption that the environment is well approximated as an infinitely large reservoir. However we might still want to record metabolic outputs for the sake of total mass balance and eventual dynamic environments*; at the moment, molecules on the boundary of the system simply emerge and disappear from nothingness.

*Dynamic in the sense that they are emergent properties of the cellular processes, and not a hardcoded shift consequent of time.

@tahorst Agreed, this doesn't precisely fit into our current State model (which, incidentally, will need to change a lot anyway if we move forward with de-partitioning). One option (probably not the right one) is to refactor many features of State into a subclass InternalState, and add a new subclass, ExternalState. However I'd first hack together the interface and see what accessor patterns emerge - that's effectively how I designed the existing framework. I think that this would also be a great opportunity to revise the language of the existing interfaces.

eagmon commented 6 years ago

I created a new development branch for refactoring environment as a simulation state object: https://github.com/CovertLab/wcEcoli/commit/7008a27ce9cff10b52935f70d111d011bb44a41d.

I made this a state in wholecell/state, but @jmason42 recommended making it a state in models/ecoli. Since there are no states currently in ecoli, I thought this would be a better place to start. I am open to either option.

environment.py uses bulk_molecules.py as a template, and I stripped out many operations related to partitioning. I am still keeping the views in there -- perhaps this can be used to interface with processes.

The next steps that I see will involve bringing in the current environment definition, as currently done in reconstruction/ecoli/flat/condition/nutrient, and pointing the various processes that use environmental conditions to this state.

tahorst commented 6 years ago

I think if it is general enough it should be in wholecell. Couple style comments - we don't need the shebang since this is just a class definition and a few of the imports are unused and can be cleaned up. Also it might be a good time to refactor to more standard function names like setCounts, increaseCounts or incrementCounts, decreaseCounts instead of countsIs, countsInc, countsDec

On a first pass, I had a couple thoughts about implementation. One thing is that we will need to assign a volume to the environment we are considering otherwise tracking just counts probably won't give us enough information. Additionally, since we are considering sharing this environment across multiple cells, we probably want to track the relevant cells and might not care so much about the processes themselves. If we are tracking cells, we might need a function to "add" new cells to the environment, for example with division. Will we want to interface with processes directly or with a sim and allow that to interface with its own processes?

I think your last point is correct - that we'll need to store the type/name of environment to keep compatibility with the current code - and is a good place to start.

jmason42 commented 6 years ago

Do you want a simulation to incorporate multiple cells, or keep one simulation as one cell, and develop a way to pass messages between simulations? The former sound easier to implement but less scalable. Have you considered one super-simulation (e.g. a culture volume) with one or more subordinate WCM simulations?

eagmon commented 6 years ago

@jmason42 We should consider all of those options, and choose what will be scalable but also flexible. Ease of implementation would be nice, but I would rather make sure the framework is robust. Most simulations will still only want to track a single cell. I also think once we get to more than 8 cells, we will want to include more ABM-like "zombie" cells that don't actually have internal processes but behave according to observed behavior of other simulations. Maybe @1fish2 has ideas regarding this.

jmason42 commented 6 years ago

Most scalable would be separate computational processes, but scalable doesn't mean "easy to develop". Often it's quite the opposite. Again, I think it's foolish to start working towards cell-cell interactions without doing both the biological and computational back-of-the-envelope estimations. E.g. how many cells, how large a volume, how much memory?

eagmon commented 6 years ago

Back of the envelope calculations: a single E. coli has an approximate volume of 1 µm^3. A vacuole can be as small as 15 µm^3. So a fully-packed small vacuole might contain up to 15 cells. If each WC simulation takes about 1 gb in memory, we might need up to 15 gb in memory for the fully-packed small vacuole. But I think we can shoot for 8 cells, 8 gb in memory, and after that consider adding ABM-like cells to drastically increase the cell count while keeping memory use down.

CovertLab / wcEcoli

Model the environment as a simulation State object #146