CovertLab / wcEcoli

Whole Cell Model of E. coli
Other
18 stars 4 forks source link

Variants and Parameter Shifts #1376

Closed rjuenemann closed 9 months ago

rjuenemann commented 1 year ago

I’m working on adding genetic engineering capabilities to the model. I previously added an option to the ParCa to add new genes to the E. coli chromosome. I have been running large batches of simulations on Sherlock to investigate the impact of promoter and ribosome binding site strength of new genes on product production and cell health. To do so in a more automated/scaled way, I developed a variant that will generate the different parameter combinations and modify the relevant sim data accordingly (models/ecoli/sim/variants/new_gene_expression_and_translation_efficiency.py).

What I would like to do now is run simulations where new gene expression is turned off (eg sim_data.process.transcription.rna_synth_prob = 0 and sim_data.process.translation.translation_efficiencies_by_monomer = 0 for the new genes) for the first few generations (say 8) and then will be turned on (eg sim_data.process.transcription.rna_synth_prob = 0.01 and sim_data.process.translation.translation_efficiencies_by_monomer = 1) for the next generations. This would allow us to observe what happens in the transition when new gene expression is induced.

@ggsun mentioned that @tahorst has run transition variants before. One example I found in the repo is models/ecoli/sim/variants/add_one_aa_shift.py . It looks to me like these shifts are time-based, and modify components of the external_state via timelines.

It’s not clear to me how we could do something similar in this case. I’m not sure if the timelines could or should be extended to sim_data that is not external_state. A time-based shift would be workable, but a generation-based shift is preferred for ease of analysis and comparability. We still need to run all parameter combinations (usually < 60) for new gene transcription and translation, like before.

One idea that seems like it could be feasible is to have the variant save multiple copies of sim_data to use for the shifts. Then through runscripts/fireworks/fw_queue.py, runscripts/manual/runSim.py, and runscripts/manual/runDaughter.py we could map the generation number to the appropriate sim_data file for the set of simulation parameters. This would allow us to maintain existing sim_data access processes within the sims. However, I’m not sure how efficient, elegant, or generalizable this implementation would be. We’d be saving entire copies of sim_data even though we are just modifying about 6-10 parameters for a handful of genes.

This could be related to issues #602 and #618. Ideally, the solution would be generalizable to multiple/other types of variants, multiple shifts, etc (e.g. new gene induction and environmental shifts or uninduced to new genes induced then back to uninduced). We would want to save all the modified parameters used during each period (i.e. not completely overwrite them) to have complete records and use later in analysis plots.

1fish2 commented 1 year ago

Good questions. Writing multiple sim_data variants could be very difficult given lots of code (including analysis classes) needing to adapt to that change.

I went down similar design paths to support operons on/off. Trying alternate Parca-level variants ran up against a lot of dependencies, and the need to compare operons=on results against operons=off led to having parallel output trees at the top.

A possibly useful idea:

I suggest minimizing the work required to make all this work and be modifiable, without worrying about the runtime cost.

rjuenemann commented 1 year ago

Hi @1fish2 - Thank you for your quick response and for sharing your perspective! This idea sounds promising to me. I'll look into it and follow up as questions arise.

rjuenemann commented 1 year ago

Hi @1fish2 - I started taking a look into this idea and I think it still sounds promising. One initial question I am not clear on is where we would check the daughter generation number and call/perform the sim_data update? Should we do this in divide_cell() or setDaughterInitialConditions() (or someplace else)? It seems like both of those functions have access to sim_data, so I am not sure to what degree it matters?

1fish2 commented 1 year ago

divide_cell() has access to the mother cell's sim_data. So it can pass an incremented generation number to the daughter cell via inherited_state.

setDaughterInitialConditions() has access to the daughter cell's sim_data, so it can update the daughter's sim_data as a function of its generation number.

Alternatively, you might want to be able to turn genes on/off more often than once per generation.