Closed rjuenemann closed 9 months ago
Good questions. Writing multiple sim_data variants could be very difficult given lots of code (including analysis classes) needing to adapt to that change.
I went down similar design paths to support operons on/off. Trying alternate Parca-level variants ran up against a lot of dependencies, and the need to compare operons=on results against operons=off led to having parallel output trees at the top.
A possibly useful idea:
divide_cell()
via a save_inherited_state()
kwarg. Then the daughter cell's setDaughterInitialConditions()
would get it from load_inherited_state()
and update a new field of sim.internal_state
.generation > 8
. Record the switch(es) in sim.internal_state
(in case the mapping ever changes).rna_synth_prob
etc. parameters, say from constants in this new "gene variant" code or copying from new, gene-variant parameters in sim_data. Then update sim_data.process.transcription.rna_synth_prob
et al, and probably record the updated values in sim_data or sim.internal_state
or a listener.I suggest minimizing the work required to make all this work and be modifiable, without worrying about the runtime cost.
Hi @1fish2 - Thank you for your quick response and for sharing your perspective! This idea sounds promising to me. I'll look into it and follow up as questions arise.
Hi @1fish2 - I started taking a look into this idea and I think it still sounds promising. One initial question I am not clear on is where we would check the daughter generation number and call/perform the sim_data update? Should we do this in divide_cell()
or setDaughterInitialConditions()
(or someplace else)? It seems like both of those functions have access to sim_data, so I am not sure to what degree it matters?
divide_cell()
has access to the mother cell's sim_data. So it can pass an incremented generation number to the daughter cell via inherited_state
.
setDaughterInitialConditions()
has access to the daughter cell's sim_data, so it can update the daughter's sim_data as a function of its generation number.
Alternatively, you might want to be able to turn genes on/off more often than once per generation.
I’m working on adding genetic engineering capabilities to the model. I previously added an option to the ParCa to add new genes to the E. coli chromosome. I have been running large batches of simulations on Sherlock to investigate the impact of promoter and ribosome binding site strength of new genes on product production and cell health. To do so in a more automated/scaled way, I developed a variant that will generate the different parameter combinations and modify the relevant sim data accordingly (
models/ecoli/sim/variants/new_gene_expression_and_translation_efficiency.py
).What I would like to do now is run simulations where new gene expression is turned off (eg
sim_data.process.transcription.rna_synth_prob = 0
andsim_data.process.translation.translation_efficiencies_by_monomer = 0
for the new genes) for the first few generations (say 8) and then will be turned on (egsim_data.process.transcription.rna_synth_prob = 0.01
andsim_data.process.translation.translation_efficiencies_by_monomer = 1
) for the next generations. This would allow us to observe what happens in the transition when new gene expression is induced.@ggsun mentioned that @tahorst has run transition variants before. One example I found in the repo is
models/ecoli/sim/variants/add_one_aa_shift.py
. It looks to me like these shifts are time-based, and modify components of theexternal_state
viatimelines
.It’s not clear to me how we could do something similar in this case. I’m not sure if the
timelines
could or should be extended tosim_data
that is notexternal_state
. A time-based shift would be workable, but a generation-based shift is preferred for ease of analysis and comparability. We still need to run all parameter combinations (usually < 60) for new gene transcription and translation, like before.One idea that seems like it could be feasible is to have the variant save multiple copies of
sim_data
to use for the shifts. Then throughrunscripts/fireworks/fw_queue.py
,runscripts/manual/runSim.py
, andrunscripts/manual/runDaughter.py
we could map the generation number to the appropriatesim_data
file for the set of simulation parameters. This would allow us to maintain existingsim_data
access processes within the sims. However, I’m not sure how efficient, elegant, or generalizable this implementation would be. We’d be saving entire copies ofsim_data
even though we are just modifying about 6-10 parameters for a handful of genes.This could be related to issues #602 and #618. Ideally, the solution would be generalizable to multiple/other types of variants, multiple shifts, etc (e.g. new gene induction and environmental shifts or uninduced to new genes induced then back to uninduced). We would want to save all the modified parameters used during each period (i.e. not completely overwrite them) to have complete records and use later in analysis plots.