Closed dr3y closed 3 years ago
Just a comment about a class naming: given that both classes Component
and Mechanisms
already exist. I'd call the new class somehow different, e.g. ComponentConverter
or Component.from_component()
I agree this is the next step in BioCRNpyler. I've been thinking about how to do this in a general way a lot and have two different use cases in mind:
I also agree with Zoltan's point about names - we should not call these Mechanisms but something different. One suggestion: `ComponentEnumertor" is the class that does this. TxTlExplorer would be an example of a ComponentEnumerator.
Here are a few high level questions I have been thinking through and don't have good answers for:
I want to stress that I think it is very important we design this part of the software very carefully before rushing to implement anything - we're running up against some fundamentally very challenging theoretical/computational issues and having the right framework in place will matter a lot for how usable these features are.
some answers i thought of:
Mixture
level because an integrase is a protein that floats around and reacts with any DNA it sees. But DNA_parts
can exist at the DNA_construct
level (is that the same as Component
level?) since they only interact with other members of the same DNA_construct
. RNA_construct
. One issue I can see about structuring it like a reaction is that something else needs to know what the inputs and outputs are. Part of what TXTL_explorer
does is figures out what the inputs and outputs of such reactions would be, and I think the plan should be to make TXTL_explorer
obsolete, right?CombinatorialPromoter
should happen at the same level as DNA_construct
enumeration (using the same code ideally). Maybe if we actually implement "Operator" as Christian suggested as a seperate part we can do that (although I don't like this for many reasons)@dr3y some more thoughts:
integrases and DNA_construct enumeration is not the same for two reasons:
DNA_part
s on a DNA_construct
which only care about their own location.DNA_construct
s which must go through the Species
/Reaction
compilation process again. This is where the recursion limit comes in, I guess. It's like a recursive DNA_part
whereas a normal DNA_part
terminates. [EDIT] although, RNA_construct
s are an exception to this. However, it still isn't recursive with RNA_construct
s. They lead to proteins or not and that's it.Your suggestion about OrderedPolymer enumeration is interesting. In my mind the way that DNA_part
s work is kind of a special case, BECAUSE they don't care about binding that happens anywhere else. I imagine a general case combinatorial enumeration would have arbitrary conditions on it such as "only this combination of bound things leads to this other bound configuration" and that's kinda what CombinatorialPromoter
does, with the tx_capable_complex
. Sort of like a templated binding. "A bound to B then binds to A bound to C to make ((BA)(AC))" but B and C can be parts of an OrderedPolymer
Here is a diagram of how I imagine this will work. (Integrase enumeration is inspired by how I am doing it in the integrases
branch. Doesn't mean it's the best way or the way we should do it)
Updated flow chart, with a diagram of the combinatorial enumeration that is currently used in DNA_construct. Black arrows represent the order of operations, and green arrows represent retrieving data. Usually things will retrieve data from things above them, which means the data has already been generated. The red arrow represents a recursive loop because a function needs to use data that it will itself generate.
Here is another diagram which illustrates the reactions that need to be created by integrases. Green arrows are normal binding/unbinding reactions, blue arrows are transcription reactions, purple arrows are added by integrases, and orange arrows represent reactions which need to be carried over to the new species created by integrases.
Somehow the integrase site DNA_part
(or the Integrase
global component) needs to know about these additional (orange) reactions. Also I have not represented intermolecular reactions here, which of course are also important (or not? depending on what you are trying to model)
actually I think the tetramer species should be very easy to produce in the "regular part compilation" step if we allow for fixing certain parts of the combinatorial polymer (require that B and P sites are bound together, but combinatorialize everything else). However, if you want to make tetramers with two different DNA molecules I think it won't work since the combinatorializing happens at the DNA_construct
level.
One idea I had was that instead of having DNA_construct
s as Component
s which go into a Mixture
you instead have DNA_construct
s associated into like a reactable_set
(or something) which does DNA_part
enumeration for ALL the DNA_construct
s together and update_species
at the very end. This would allow for DNA_part
s which know about other DNA_construct
s, but would possibly be very clunky
First - impressive diagram! I'm amazed you turn these out so quickly! It definitely helps visualize what we are trying to do.
Here is a thought: OrderedPolymerSpecies will have a "combinatorializaton" (but lets use a real word for this) function. When two OrderedPolymerSpecies bind together (or a single OrderedPolymerSpecies "folds" into a hairpin) it becomes a PolymerNetworkSpecies which will have a similar "combinatorialization" function.
Here is a first pass at specifying PolymerNetworkSpecies.
The concept of making the combinatorial complexes implies that you have a list of independent options and you can have any combination of these options. For example, a promoter could be bound to RNAP or it could be not bound. Likewise an integrase site can be bound and it can be not bound or it can be in a tetramer with another site. So then we need to come up with a list of options, such as [unbound, bound]
or [unbound, bound, tetramer]
and then have a function which goes through all these options. A list of options is generated when you run a mechanism on a part. Just using the contents of an OrderedPolymerSpecies
or something to come up with these options is insufficient.
The challenge I'm seeing here is that integrases involve a set of options which are not independent any more. Now the tetramer option requires that something else is occupied too. This becomes very tricky when you consider the possibility of multiple integrases with multiple sites and multiple pieces of DNA. I think this would need some sort of graph theory type of solution, and I don't know what that would be. Perhaps the real thing we need is a CombinatorialSet
which is a set of options that has not been resolved into the individual OrderedComplexSpecies
yet. That way you can go through these options and resolve them or evaluate them for validity or something like that.
I'm not sure I quite understand the problem you are pointing out - but maybe the solution is to have a class DNA_network which is a PolymerNetwork, in other words, DNA_network is to DNA_construct as PolymerNetworkSpecies is to OrderedPolymerSpecies.
either way neither DNA_construct nor OrderedPolymerSpecies know about what the different combinatorial options should be. That is a job for mechanisms
Notes on Compilation Order:
Component Enumeration Steps Produce More Components
Mechanism Steps Produce Species and Reactions
Combinatorial Enumeration (Such as Binding) is part of 2. Can this be packaged as a helper function in OrderedPolymerSpecies somehow?
Notes on Combinatorial Enumeration:
OrderedPolymerSpecies.combinatorial_enumeration([list of OrderedPolymersSpecies of the same length])
Notes on Binding Polymers together
class PolymerNetwork(OrderedPolymer)
: A set of coupled OrderedPolymers.
class DNA_construct_network(DNA_construct)
: A set of coupled DNA_constructs
class PolymerNetworkSpecies(OrderedPolymerSpecies)
: This class contains a list of 1+ OrderedPolymersSpecies and a list of 1+ ComplexSpecies which contain 2+ OrderedMonomerSpecies (and other species are allowed as well).
How does this all add up? (Using integrases as an example)
Classes
I am starting to work on this a little bit.
The first thing that I have made is a TxTlExplorer_CE
which is a component enumerator version of TxTlExplorer
. By default this is packaged together with DNA_construct
and RNA_construct
objects.
This can be found in my branch here https://github.com/dr3y/BioCRNPyler/tree/component_enumerator
Here is an example code that demonstrates its functionality:
ptet = RegulatedPromoter("ptet",["tetr"],leak=True) #this is a repressible promoter
pconst = Promoter("pconst") #constitutive promoter
utr1 = RBS("UTR1") #regular RBS
gfp = CDS("GFP","GFP") #first one is the name of the `dna_part`, second one is the name of the protein that is made
t16 = Terminator("t16") #a terminator stops transcription
construct_1 = DNA_construct([[ptet,"forward"],[utr1,"forward"],[gfp,"forward"],[t16,"forward"]])
y = construct_1.enumerate_components()
print(y)
#[rna = UTR1_GFP_t16, GFP_2, DNA_construct = ptet_UTR1_GFP_t16]
print(y[2][0].transcript.pretty_print())
#rna[rna[UTR1-forward]:rna[GFP-forward]:rna[t16-forward]]
print(y[2][0].protein[0].pretty_print())
#protein[GFP]
roughly the same as issue #11
Compilation Overview:
Design Specifications:
The two new classes:
LocalComponentEnumerator() --> general method "recursion(depth = N) which calls self.enumerate(....) internally, stops when no new things are returned OR depth > N. --> self.enumerate method has no inputs
GlobalComponentEnumerator(ComponentEnumerator) --> self.enumerate method takes a lost of Components as input
Mixture has a general recursion(depth = N) method which calls all the GlobalComponentEnumerators
Local Example: DNAassembly has a TxExplorer LocalComponentEnumerator which creates RNAassemblies that have TlExplorer LocalComponentEnumerators (and other things). TlExplorer returns new components which do not have Enumerators, so the recursion ends.
Simple Splicing would be RNAs with Splicing LocalComponentEnumerators.
Transplicing (RNA that can get spliced onto another RNA) would be GlobalComponentEnumeration.
This has been added via Component enumeration which is now part of the compilation processes where Mixtures can call Component.Enumerate_components to produce new Components.
Currently we have
Species
which can be involved inMechanisms
. However, wouldn't it be nice to haveComponentMechanisms
which can convertComponents
into otherComponents
?Some examples include:
Compilation is something like this:
Component
definitions (includingDNA_construct
)ComponentMechanisms
on theComponents
in theMixture
(generates specialReactions
and newComponents
) a.ComponentMechanisms
are contained inside certainComponent
sComponent
s and originalComponent
s and feed them into theMixture
update_species
/update_reactions
generation