BuildACell / bioCRNpyler

A modular compiler for biological chemical reaction networks
BSD 3-Clause "New" or "Revised" License
39 stars 24 forks source link

Component level mechanisms #170

Closed dr3y closed 3 years ago

dr3y commented 4 years ago

Currently we have Species which can be involved in Mechanisms. However, wouldn't it be nice to have ComponentMechanisms which can convert Components into other Components?

Some examples include:

Compilation is something like this:

  1. Component definitions (including DNA_construct)
  2. Run ComponentMechanisms on the Components in the Mixture (generates special Reactions and new Components) a. ComponentMechanisms are contained inside certain Components
  3. get generated Components and original Components and feed them into the Mixture
  4. normal update_species/update_reactions generation
zoltuz commented 4 years ago

Just a comment about a class naming: given that both classes Component and Mechanisms already exist. I'd call the new class somehow different, e.g. ComponentConverter or Component.from_component()

WilliamIX commented 4 years ago

I agree this is the next step in BioCRNpyler. I've been thinking about how to do this in a general way a lot and have two different use cases in mind:

  1. Components which "copy" variations of themselves. An example of this is DNA_construct which copies itself into RNA_construct. DNA_parts in DNA_construct also copy themselves over and over again in order to enumerate species with combinatorial binding patterns.
  2. Components which enumerate new Components based on themselves and other Components in the Mixture. Examples of this are integrases which take 1 or more DNA_constructs to produce novel DNA_constructs. In general, this case will certainly have to be recursive with finite depth considerations.

I also agree with Zoltan's point about names - we should not call these Mechanisms but something different. One suggestion: `ComponentEnumertor" is the class that does this. TxTlExplorer would be an example of a ComponentEnumerator.

Here are a few high level questions I have been thinking through and don't have good answers for:

  1. Do ComponentEnumerators exist inside Mixture or inside Components or both? In particular, for case (1) above, enumeration could happen at the Component level. For case (2) above, enumeration has to happen at the Mixture level. Maybe we need two kinds of objects?
  2. How much can this kind of enumeration be automated and standardized? For example, maybe enumeration looks like reaction but the inputs and outputs are Component classes instead of Species. For example DNA_construct takes a Promoter, Terminator --> RNA_construct. Integrases: Integrase_Site_A + Integrase_Site_B --> DNA_construct(s). However, this might not be the right abstraction because I am not sure if we can make enumeration in this way efficient in general.
  3. One part of Component Enumeration involves enumerating the Species inside Components. For example, DNA_construct "multiplies" each DNA_part together to get all the combinatorial bound/unbound states. Can this procedure be generalized? Is this a effectively a kind of Component Enumeration or something different?

I want to stress that I think it is very important we design this part of the software very carefully before rushing to implement anything - we're running up against some fundamentally very challenging theoretical/computational issues and having the right framework in place will matter a lot for how usable these features are.

dr3y commented 4 years ago

some answers i thought of:

  1. Integrases must occur at the Mixture level because an integrase is a protein that floats around and reacts with any DNA it sees. But DNA_parts can exist at the DNA_construct level (is that the same as Component level?) since they only interact with other members of the same DNA_construct.
  2. Transcription must know about more than just a promoter and terminator because it needs to know about all the things in between since they become part of the RNA_construct. One issue I can see about structuring it like a reaction is that something else needs to know what the inputs and outputs are. Part of what TXTL_explorer does is figures out what the inputs and outputs of such reactions would be, and I think the plan should be to make TXTL_explorer obsolete, right?
  3. it should be possible. For example CombinatorialPromoter should happen at the same level as DNA_construct enumeration (using the same code ideally). Maybe if we actually implement "Operator" as Christian suggested as a seperate part we can do that (although I don't like this for many reasons)
WilliamIX commented 4 years ago

@dr3y some more thoughts:

  1. I guess my question here really is "are Integrases the same kind of Component Enumeration as DNA_construct" (in an abstract sense) or are they fundamentally different (ie might require different function calls in different places)?
  2. Due to the OrderedMonomer and OrderedPolymer classes, given two OrderedMonomers it is easy to find everything inbetween. This is what TXTL explorer does - I'm mostly just thinking of how to generalize the logic to other things. On that note, I think TXTL_explorer will be refactored (ie its code might move to other places) but I doubt its logic will be completely obsolete.
  3. I wonder if this kind of combinatoric enumeration can be a function of OrderedPolymer (can we make CombinatoricPromoter an OrderedPolymerSpecies?)?
dr3y commented 4 years ago

integrases and DNA_construct enumeration is not the same for two reasons:

Your suggestion about OrderedPolymer enumeration is interesting. In my mind the way that DNA_parts work is kind of a special case, BECAUSE they don't care about binding that happens anywhere else. I imagine a general case combinatorial enumeration would have arbitrary conditions on it such as "only this combination of bound things leads to this other bound configuration" and that's kinda what CombinatorialPromoter does, with the tx_capable_complex. Sort of like a templated binding. "A bound to B then binds to A bound to C to make ((BA)(AC))" but B and C can be parts of an OrderedPolymer

Here is a diagram of how I imagine this will work. (Integrase enumeration is inspired by how I am doing it in the integrases branch. Doesn't mean it's the best way or the way we should do it)

image

dr3y commented 4 years ago

Updated flow chart, with a diagram of the combinatorial enumeration that is currently used in DNA_construct. Black arrows represent the order of operations, and green arrows represent retrieving data. Usually things will retrieve data from things above them, which means the data has already been generated. The red arrow represents a recursive loop because a function needs to use data that it will itself generate. image

dr3y commented 4 years ago

Here is another diagram which illustrates the reactions that need to be created by integrases. Green arrows are normal binding/unbinding reactions, blue arrows are transcription reactions, purple arrows are added by integrases, and orange arrows represent reactions which need to be carried over to the new species created by integrases.

Somehow the integrase site DNA_part (or the Integrase global component) needs to know about these additional (orange) reactions. Also I have not represented intermolecular reactions here, which of course are also important (or not? depending on what you are trying to model) image

dr3y commented 4 years ago

actually I think the tetramer species should be very easy to produce in the "regular part compilation" step if we allow for fixing certain parts of the combinatorial polymer (require that B and P sites are bound together, but combinatorialize everything else). However, if you want to make tetramers with two different DNA molecules I think it won't work since the combinatorializing happens at the DNA_construct level.

One idea I had was that instead of having DNA_constructs as Components which go into a Mixture you instead have DNA_constructs associated into like a reactable_set (or something) which does DNA_part enumeration for ALL the DNA_constructs together and update_species at the very end. This would allow for DNA_parts which know about other DNA_constructs, but would possibly be very clunky

WilliamIX commented 4 years ago

First - impressive diagram! I'm amazed you turn these out so quickly! It definitely helps visualize what we are trying to do.

Here is a thought: OrderedPolymerSpecies will have a "combinatorializaton" (but lets use a real word for this) function. When two OrderedPolymerSpecies bind together (or a single OrderedPolymerSpecies "folds" into a hairpin) it becomes a PolymerNetworkSpecies which will have a similar "combinatorialization" function.

Here is a first pass at specifying PolymerNetworkSpecies.

  1. What is it? PolymerNetworkSpecies is a class for storing networks of one or more OrderedPolymerSpecies bound together in particular conformations (including a single OrderedPolymerSpecies bound to itself, such as in an RNA hairpin).
  2. What does it contain? PolymerNetworkSpecies contains a list of OrderedPolymerSpecies and a list of ComplexSpecies each of which contains two or more OrderedMonomerSpecies from the OrderedPolymerSpecies inside the PolymerNetworkSpecies. Note that these ComplexSpecies can represent binding sites and can contain auxiliary Species (such as proteins that might mediate binding).
  3. What does it do? It allows for equality to be tested (by alphabetizing the above lists and comparing two PolymerNetworkSpecies). It can do combinatorialization by calling the combinatorialization of its internal OrderedPolymerSpecies and then combining them together.
  4. What do we need to figure out? 4.a. What is this things string representation for SBML? 4.b. Is it a subclass of OrderedPolymerSpecies or something else? 4.c. How does this class interface with the Complex function for general binding reactions? 4.d. Should OrderedComplexSpecies be allowed to represent OrderedPolymerSpecies binding as well as ComplexSpecies?
dr3y commented 4 years ago

The concept of making the combinatorial complexes implies that you have a list of independent options and you can have any combination of these options. For example, a promoter could be bound to RNAP or it could be not bound. Likewise an integrase site can be bound and it can be not bound or it can be in a tetramer with another site. So then we need to come up with a list of options, such as [unbound, bound] or [unbound, bound, tetramer] and then have a function which goes through all these options. A list of options is generated when you run a mechanism on a part. Just using the contents of an OrderedPolymerSpecies or something to come up with these options is insufficient.

The challenge I'm seeing here is that integrases involve a set of options which are not independent any more. Now the tetramer option requires that something else is occupied too. This becomes very tricky when you consider the possibility of multiple integrases with multiple sites and multiple pieces of DNA. I think this would need some sort of graph theory type of solution, and I don't know what that would be. Perhaps the real thing we need is a CombinatorialSet which is a set of options that has not been resolved into the individual OrderedComplexSpecies yet. That way you can go through these options and resolve them or evaluate them for validity or something like that.

WilliamIX commented 4 years ago

I'm not sure I quite understand the problem you are pointing out - but maybe the solution is to have a class DNA_network which is a PolymerNetwork, in other words, DNA_network is to DNA_construct as PolymerNetworkSpecies is to OrderedPolymerSpecies.

  1. Start with all DNA_constructs that have integrase sites. Figure out which of them match and but them through the integrase Component Enumeration to get more DNA_constructs. Maybe instead of returning only DNA_constructs, DNA_network objects are returned to represent the bound tetramer complexes.
  2. These DNA_constructs and DNA_networks can both call update_species/reactions on their DNA_parts and do combinatoric enumeration.
dr3y commented 4 years ago

either way neither DNA_construct nor OrderedPolymerSpecies know about what the different combinatorial options should be. That is a job for mechanisms

WilliamIX commented 4 years ago

Notes on Compilation Order:

Component Enumeration Steps Produce More Components

  1. Global Component Enumeration (Components can interact with other Components)
  2. Local Component Enumeration (Components only know about themselves)

Mechanism Steps Produce Species and Reactions

  1. Component.update_species/reaction applies Component Mechanisms
  2. Mixture Applies Global Mechanisms

Combinatorial Enumeration (Such as Binding) is part of 2. Can this be packaged as a helper function in OrderedPolymerSpecies somehow?

Notes on Combinatorial Enumeration:

OrderedPolymerSpecies.combinatorial_enumeration([list of OrderedPolymersSpecies of the same length])

Notes on Binding Polymers together class PolymerNetwork(OrderedPolymer): A set of coupled OrderedPolymers.

class DNA_construct_network(DNA_construct): A set of coupled DNA_constructs

class PolymerNetworkSpecies(OrderedPolymerSpecies): This class contains a list of 1+ OrderedPolymersSpecies and a list of 1+ ComplexSpecies which contain 2+ OrderedMonomerSpecies (and other species are allowed as well).

How does this all add up? (Using integrases as an example)

  1. Global Component Enumeration produces a bunch of DNA_construct networks
  2. Each DNA_construct and DNA_construct network does local component enumeration. DNA_construct networks use their internal DNA_constructs local_component enumeration as well. This step involves all combinatorial enumeration.
  3. All the DNA_parts created above update species and reactions.

Classes

dr3y commented 3 years ago

I am starting to work on this a little bit.

The first thing that I have made is a TxTlExplorer_CE which is a component enumerator version of TxTlExplorer. By default this is packaged together with DNA_construct and RNA_construct objects.

This can be found in my branch here https://github.com/dr3y/BioCRNPyler/tree/component_enumerator

Here is an example code that demonstrates its functionality:

ptet = RegulatedPromoter("ptet",["tetr"],leak=True) #this is a repressible promoter
pconst = Promoter("pconst") #constitutive promoter
utr1 = RBS("UTR1") #regular RBS
gfp = CDS("GFP","GFP") #first one is the name of the `dna_part`, second one is the name of the protein that is made
t16 = Terminator("t16") #a terminator stops transcription
construct_1 = DNA_construct([[ptet,"forward"],[utr1,"forward"],[gfp,"forward"],[t16,"forward"]]) 
y = construct_1.enumerate_components()
print(y)
#[rna = UTR1_GFP_t16, GFP_2, DNA_construct = ptet_UTR1_GFP_t16]
print(y[2][0].transcript.pretty_print())
#rna[rna[UTR1-forward]:rna[GFP-forward]:rna[t16-forward]]
print(y[2][0].protein[0].pretty_print())
#protein[GFP]
WilliamIX commented 3 years ago

roughly the same as issue #11

WilliamIX commented 3 years ago

Compilation Overview:

  1. Global Component Enumeration: Can look at all Components - happens in the Mixture ** This should be order independent and allow global component enumerators to interact.
  2. Local Component Enumeration: Can look just at a Single Component - happens in the Component ** This is also order independent because local component enumerators do not interact
  3. Species/Reaction Compilation via Component Mechanisms
  4. Global Mechanisms

Design Specifications:

The two new classes:

Mixture has a general recursion(depth = N) method which calls all the GlobalComponentEnumerators

Local Example: DNAassembly has a TxExplorer LocalComponentEnumerator which creates RNAassemblies that have TlExplorer LocalComponentEnumerators (and other things). TlExplorer returns new components which do not have Enumerators, so the recursion ends.

Simple Splicing would be RNAs with Splicing LocalComponentEnumerators.

Transplicing (RNA that can get spliced onto another RNA) would be GlobalComponentEnumeration.

WilliamIX commented 3 years ago

This has been added via Component enumeration which is now part of the compilation processes where Mixtures can call Component.Enumerate_components to produce new Components.