SynBioDex / SEPs

SBOL Enhancement Proposals
11 stars 16 forks source link

SEP 035 -- Interactions and Models on ComponentDefinition #80

Closed jakebeal closed 4 years ago

jakebeal commented 5 years ago

Proposal for a bridging path to merging ComponentDefinitions and ModuleDefinitions before SBOL 3 by the simple expedient of adding Interactions and Models to ComponentDefinition (and related changes). https://github.com/SynBioDex/SEPs/blob/master/sep_035.md

cjmyers commented 5 years ago

I'm generally supportive of this idea, but it will take some substantial plumbling in the libraries to make it work. I think we need to be very careful in how this gets used, since it is going to hamper some exchange potentially. We would like to make use of this in visualization tools like VisBol, since it would allow flattening out some of the hierarchy that cannot currently be flattened, but we might want the actual exchange to still happen using Interactions/Models on ModuleDefinitions. Of course, this assumes that we have tools that will be broken. Something that we might check before worrying too much about the compatibility issues.

graik commented 5 years ago

An example would be helpful to see how "functional" connections are supposed to be implemented between Component(Definition)s. If I understand the text correctly, you want to use SequenceConstraints for expressing, e.g., that a promoter recognizes a certain target sequence? I think that is not a good idea because it mixes structural (sequence) and functional information.

Instead all this could very elegantly be expressed through Interaction with appropriate participations:

Interaction

Since we have role properties, this would also allow to define abstract devices for which for example the actual TF and target sequence still need to be defined. This solution would IMO make mapsTo obsolete in a simpler fashion than SEP 35 and 37.

jakebeal commented 5 years ago

@graik You appear to be responding to something quite different than what this is proposing. Here, I'm mostly just proposing a "backward compatible" CD/MD merger in which we add all the current functionality of MD into the CD class and stop bothering to create new MDs.

The adjustments to ComponentDefinition are really just there to prevent nonsense ideas like trying to define a DNA sequence for a small molecule.

bbartley commented 5 years ago

I certainly agree with the motivations behind this SEP, but I would like to offer a slightly different approach that will accomplish much of the same thing with a few extra benefits. Namely, making ModuleDefinition a subclass of ComponentDefinition. I have drafted an alternative SEP, but I would need an editor or another developer to voice interest in order to officially submit it.

jakebeal commented 5 years ago

@bbartley Would your SEP be backward compatible (i.e., possible as 2.4) or would it need to be 3.0? If the latter, then I'm in favor of simply eliminating ModuleDefinition entirely.

bbartley commented 5 years ago

I think there's a migration path through 2.4, similar to yours, but of course others should review it.

jakebeal commented 5 years ago

Share a link?

bbartley commented 5 years ago

https://github.com/bbartley/SEPs/blob/master/sep_038.md

cjmyers commented 5 years ago

I think this might be backwards. I thought the goal was to add Interactions to CD.

My personal feeling is this is too big a change for 2.4. We would need to do substantial work on the libraries and software, and that would be time better spent on 3.0.

On Aug 28, 2019, at 7:31 PM, bbartley notifications@github.com wrote:

https://github.com/bbartley/SEPs/blob/master/sep_038.md https://github.com/bbartley/SEPs/blob/master/sep_038.md — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SEPs/issues/80?email_source=notifications&email_token=AA2YH57EZNMBLE2N26DBV2TQG4RHRA5CNFSM4IDSVVP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5M6CTA#issuecomment-525984076, or mute the thread https://github.com/notifications/unsubscribe-auth/AA2YH52QD34CMLPOWD6KJT3QG4RHRANCNFSM4IDSVVPQ.

bbartley commented 5 years ago

A concern I have about this proposal is that it eliminates ModuleDefinition from the data model entirely, thus throwing the baby out with the bath water. Keep in mind that one of the high-level motivations behind this proposal and our movement to SBOL 3.0 is to harmonize the data model with SBOL visual. So this proposal should also address how it will impact our convention for drawing modules with SBOL visual.

jamesamcl commented 5 years ago

@cjmyers already voiced my opinion, but to put it in writing here - I feel that this SEP would significantly increase the complexity of an already confusing specification with little or no benefit, and potentially at the cost of delaying 3.0 even further by bodging things in the meantime.

With regards to the merging in general, I think that most of us are now on the same page. We should take advantage of that rare occurrence ;-) and invest our time and energy into making a clean 3.0.

jakebeal commented 5 years ago

If SBOL 3 proceeds expeditiously, I'm fine to have this become obsolete. That said, I also suggest this as a possible path to merging MD and CD: just union the two, whatever we call the outcome.

graik commented 5 years ago

This is a substantial change -- I would also suggest to implement it cleanly in 3.0 rather than having a rather complex version 2.4 with a hybrid data model.

I am not sure it is wise to overload Component(Definition) with additional meaning that goes away from "structural composition". It is an advantage to have one single class that is clearly responsible for representing the sequence level of a design (be it DNA or protein) -- this facilitates badly needed uptake by sequence editing software and low-level design tools.

I agree that Module was relatively poorly designed (after a lot of discussion). If it remains simply an unordered bag of components that pretty much copies CD, then, yes, I understand the urge to merge the two. By contrast, I would argue that, if we want to get rid of Module, then the class to replace it with should not be Component(Definition) but Interaction. With some small modifications, this would create a clearly structured "operational" layer on top of the structural layer of Components. The beauty of Interaction is that every participant has a specified role and the type field of Interaction can be used to tell clients what roles to expect for a given type of Interaction (aka Device, aka Module). A structural description needs to tell where each component lies within the final sequence. A functional description needs to say how each component participates in a given module. Component(definition) satisfies the former requirement. Interaction satisfies the latter requirement. The current Module is useless for both.

cjmyers commented 5 years ago

No one is saying Module was poorly designed. That is not the issue being raised. The issue is that it is complicated to specify both structure and function simultaneously due to the split between CD and MD. This is one of the current main uses of MapsTos. If we merge CD and MD, then structure and function can co-exist more cleanly. Simply switching Module to Interaction will not solve this problem.

Here is an example, I want to say I have a transcriptional unit that includes a CDS C that codes for protein P. This includes structure for the transcriptional unit and an interaction that CDS codes for P. Currently, this is encoded with separate structure and function like this:

CD TU Comp Pro Pr Comp RBS R Comp CDS C Comp Ter T SC Pr precedes R SC R precedes C SC C precedes T NOTE: could use SA instead in this example, no big difference

MD myCircuit FC TU MapsTo local C to remote C FC C FC P Interaction C codes for P

If CD and MD are merged, we get this instead:

CD TU Comp Pro Pr Comp RBS R Comp CDS C Comp Ter T SC Pr precedes R SC R precedes C SC C precedes T Interaction C codes for P

jakebeal commented 5 years ago

@graik Please feel free to make a competing proposal if you think you have a better way of organizing functional information. Somehow, however, I will need to say that a particular system has a whole collection of interactions all tied together. I have a suspicion that when you propose a way of handling this, it will converge back to something similar to my proposal.

graik commented 5 years ago

Yes, my proposal would look very similar. But I would not allow CD to "own" Interactions. There should be a clear hierachy: Interaction "owns" Component(Definition), not Interaction <-> CD. This would give a better separation of concerns for tool developers because there is clear layers. In your example:

(1) Parts: CD Pro Pr, CD RBS R, CDS C, CD Ter T, ..., Protein P (2) Structure: CD plasmid (Com Pro Pr, Comp RBS R, ... sequence constraints) (3) Function: Interaction C codes for P

No need for MapsTo or FC or all that. Layer (2) could be initially be left out for later "compilation to sequence". More complex examples will need Interactions that can have other Interactions as participants. That's the main modification needed to the data model.

I agree that this is a separate proposal.

cjmyers commented 5 years ago

In your example, there is nothing that ties the structure to the function. Including the interactions and structure in the same object achieves this. In this way, you can indicate if the interactions that are appropriate for a given structure and perhaps also the given context. Indeed, currently when we are defining strains, we are including all of this information together in a single MD to make it clear that all of this information relates to a particular strain.

For visualizations, you also need these tied together to know that these relationships should be visualized together, which is what in some sense has motivated this change.

graik commented 5 years ago

@cjmyers In my (rough) model, your visualization software needs to look for Interactions. If the Interaction is well formulated, all the information is tied in there, including the links down to all the required structural elements. In my example, Interaction is minimal and only gives the link CD -> protein. But one could define a transcription unit Interaction that ties things together like the top Component in Jake's example. This will indeed look pretty similar to Jake's model but have the advantage of giving a clear directionality.

Another issue with using Component(Definition) for everything is that functional assemblies very often do not correspond to structural units (unlike in this example were everything is neatly on one DNA strand next to each other). So in real-world examples, there would often be several overlapping Component(Definitions) -- e.g. two CDs defining two plasmids plus another CD assembling the functional circuit from sub-elements of both plasmids. Where do you start then? How does your software decide that the latter CD is important for drawing the diagram?

Anyway, there is different solutions to this. Merging Module into Component is not the worst one. But it continues to put a lot of heavy lifting on software to infer and guess what is relevant for a given problem.

Greetings Raik

On Sun, Sep 1, 2019 at 11:40 PM cjmyers notifications@github.com wrote:

In your example, there is nothing that ties the structure to the function. Including the interactions and structure in the same object achieves this. In this way, you can indicate if the interactions that are appropriate for a given structure and perhaps also the given context. Indeed, currently when we are defining strains, we are including all of this information together in a single MD to make it clear that all of this information relates to a particular strain.

For visualizations, you also need these tied together to know that these relationships should be visualized together, which is what in some sense has motivated this change.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SEPs/issues/80?email_source=notifications&email_token=AAOGZXPOQKQXPS2EKT7YJ3DQHQSC5A5CNFSM4IDSVVP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5UKGTI#issuecomment-526951245, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOGZXIBRH53EV4N3VYY6BTQHQSC5ANCNFSM4IDSVVPQ .

--


Raik Grünberg http://www.raiks.de/contact.html


jakebeal commented 5 years ago

@graik An important thing that we're currently using ModuleDefinitions to represent is compositions where there is not a structural relationship, such as media recipes, plasmid mixtures, and reaction structures. I'm not sure how your proposal addresses these.

graik commented 5 years ago

Yes, I totally agree that this is important. I would just argue that we don't help anyone if we lump everything we cannot otherwise express into ComponentDefinition. The whole point of a class is to represent one specific thing. The clean solution would be to create new classes that specifically solve such newly identified problems.

Now this is a different discussion. It would be nice to put that on the agenda for one of the Skype meetings. My personal prime candidate for a new class would be Cell or Strain. Something like Reaction (linking to components with associated concentrations) may also be useful.

On Mon 2. Sep 2019 at 14:53, Jacob Beal notifications@github.com wrote:

@graik https://github.com/graik An important thing that we're currently using ModuleDefinitions to represent is compositions where there is not a structural relationship, such as media recipes, plasmid mixtures, and reaction structures. I'm not sure how your proposal addresses these.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SEPs/issues/80?email_source=notifications&email_token=AAOGZXLFXXSEMIOLUP2XCKDQHT5FFA5CNFSM4IDSVVP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5VTMMY#issuecomment-527119923, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOGZXMKOPOVK6RJMSCB2FTQHT5FFANCNFSM4IDSVVPQ .

--


Raik Grünberg http://www.raiks.de/contact.html


cjmyers commented 5 years ago

@graik It is exactly lumping classes that this proposal is about. We tried doing this as separate classes, and it made for a complex data model. The goal here is to make the data model more streamlined and easier to use for a wide variety of purposes. While having a separate class for every type of object we might want to represent sounds appealing, it makes the data model more complex and not very flexible when we discover there is something else we want to represent.

The merging of CD and MD has the potential to substantially reduce the size of the specification and the difficulty in representing new things. Goals I believe we all agree on. Simply changing the names of classes or adding new classes would go against these important goals. We will talk about the merge at a future meeting.

graik commented 5 years ago

Hi Chris,

first of all, I do agree that Module should be merged.

On Tue, Sep 3, 2019 at 12:18 AM cjmyers notifications@github.com wrote:

@graik https://github.com/graik It is exactly lumping classes that this proposal is about. We tried doing this as separate classes, and it made for a complex data model. The goal here is to make the data model more streamlined and easier to use for a wide variety of purposes. While having a separate class for every type of object we might want to represent sounds appealing, it makes the data model more complex and not very flexible when we discover there is something else we want to represent.

There is currently 30 validation rules listed for ComponentDefinition alone. About 10 of those validation rules deal with all the inconsistencies and conflicts and potential nonsense data that arise from the fact that DNA, protein, RNA and small molecules are all represented by the same class. (just look for validation rules that start "If the type of a ComponentDefinition contains..." 10520 is a marvelous example.) If you now, in addition, use the same class for functional assemblies, you will need to add many more validation rules that prevent the inclusion of a reaction mixture in a plasmid sequence or the representation of a protein that contains a cell, etc pp. Interpreting SBOL data then becomes even more a game of "inference" from context. That's not what a streamlined data model is about. Otherwise, I suggest we merge all the remaining SBOL classes into Component as well. That would be only logical if the class count is your measure of complexity.

Since this is becoming nerd-philosophical, I might as well point to the "Zen of Python", 20 or so basic guidelines behind the design of one of the most successful and popular programming languages -- most of it also applies to our effort: https://inventwithpython.com/blog/2018/08/17/the-zen-of-python-explained/ The first 4 rules are:

If we want to represent a Reaction mixture in SBOL, then we should explicitely do so. It is the most simple solution to the problem. Otherwise a programmer has to guess the meaning of the data from a combination of fields. More generally, we want to model pretty complex things here... so some complexity cannot be avoided, that's just the nature of our domain. What we can avoid is making things unnecessarily complicated by pretending the complexity does not exist.

The merging of CD and MD has the potential to substantially reduce the size of the specification and the difficulty in representing new things. Goals I believe we all agree on. Simply changing the names of classes or adding new classes would go against these important goals. We will talk about the merge at a future meeting.

Again, I agree with the general thrust of removing Module, FunctionalComponent and all the complicated mappings.

Greetings Raik

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SEPs/issues/80?email_source=notifications&email_token=AAOGZXJAKBGWHFS6Y5NUIF3QHV7IRA5CNFSM4IDSVVP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5WRYDY#issuecomment-527244303, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOGZXPJ7VZTBV45SW5AXHDQHV7IRANCNFSM4IDSVVPQ .

--


Raik Grünberg http://www.raiks.de/contact.html


bbartley commented 5 years ago
  • Beautiful is better than ugly.
  • Explicit is better than implicit.
  • Simple is better than complex.
  • Complex is better than complicated.

While neither endorsing nor disagreeing with any specific proposal in this thread, I whole-heartedly agree with this nerd-philosophy. :)

I also recommend Thomas Gruber's Principles for the Design of Ontologies used for Knowledge Sharing which are similar:

cjmyers commented 5 years ago

Hi Raik,

I’m definitely not advocating merging all classes into one. We also need to clean up the validation rules. This is an orthogonal issue.

There are many other languages in other domains for expressing hierarchical structures (from software to hardware). These languages do not have two types of hierarchical objects. They have a single type of object that can express both structure and function. For example, see the Verilog language where both structure and function can co-exist in the same object type, the module. This is all we are advocating. It is the most flexible approach. Again, this is not about minimizing classes. Rather it is about the split of structure and function that has made expressing objects that have both structure and function unnecessarily complex. Any creation of new specific classes will just extend this problem further.

Cheers, Chris

On Sep 4, 2019, at 5:30 AM, Raik Grünberg notifications@github.com wrote:

Hi Chris,

first of all, I do agree that Module should be merged.

On Tue, Sep 3, 2019 at 12:18 AM cjmyers notifications@github.com wrote:

@graik https://github.com/graik It is exactly lumping classes that this proposal is about. We tried doing this as separate classes, and it made for a complex data model. The goal here is to make the data model more streamlined and easier to use for a wide variety of purposes. While having a separate class for every type of object we might want to represent sounds appealing, it makes the data model more complex and not very flexible when we discover there is something else we want to represent.

There is currently 30 validation rules listed for ComponentDefinition alone. About 10 of those validation rules deal with all the inconsistencies and conflicts and potential nonsense data that arise from the fact that DNA, protein, RNA and small molecules are all represented by the same class. (just look for validation rules that start "If the type of a ComponentDefinition contains..." 10520 is a marvelous example.) If you now, in addition, use the same class for functional assemblies, you will need to add many more validation rules that prevent the inclusion of a reaction mixture in a plasmid sequence or the representation of a protein that contains a cell, etc pp. Interpreting SBOL data then becomes even more a game of "inference" from context. That's not what a streamlined data model is about. Otherwise, I suggest we merge all the remaining SBOL classes into Component as well. That would be only logical if the class count is your measure of complexity.

Since this is becoming nerd-philosophical, I might as well point to the "Zen of Python", 20 or so basic guidelines behind the design of one of the most successful and popular programming languages -- most of it also applies to our effort: https://inventwithpython.com/blog/2018/08/17/the-zen-of-python-explained/ The first 4 rules are:

  • Beautiful is better than ugly.
  • Explicit is better than implicit.
  • Simple is better than complex.
  • Complex is better than complicated.

If we want to represent a Reaction mixture in SBOL, then we should explicitely do so. It is the most simple solution to the problem. Otherwise a programmer has to guess the meaning of the data from a combination of fields. More generally, we want to model pretty complex things here... so some complexity cannot be avoided, that's just the nature of our domain. What we can avoid is making things unnecessarily complicated by pretending the complexity does not exist.

The merging of CD and MD has the potential to substantially reduce the size of the specification and the difficulty in representing new things. Goals I believe we all agree on. Simply changing the names of classes or adding new classes would go against these important goals. We will talk about the merge at a future meeting.

Again, I agree with the general thrust of removing Module, FunctionalComponent and all the complicated mappings.

Greetings Raik

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SEPs/issues/80?email_source=notifications&email_token=AAOGZXJAKBGWHFS6Y5NUIF3QHV7IRA5CNFSM4IDSVVP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5WRYDY#issuecomment-527244303, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOGZXPJ7VZTBV45SW5AXHDQHV7IRANCNFSM4IDSVVPQ .

--


Raik Grünberg http://www.raiks.de/contact.html


— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SEPs/issues/80?email_source=notifications&email_token=AA2YH5YFLWU4PCCMXMUEYUDQH3CMVA5CNFSM4IDSVVP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5ZPDUY#issuecomment-527626707, or mute the thread https://github.com/notifications/unsubscribe-auth/AA2YH57AOQNCJMV3JMILJBTQH3CMVANCNFSM4IDSVVPQ.

bbartley commented 5 years ago

FWIW, I feel the best compromise between Chris' and Raik's perspectives is to make ModuleDefinition a subclass of ComponentDefinition. It effectively merges the separate hierarchies into a single hierarchy, makes clear semantic "layers" of structure and function, avoids overloading ComponentDefinition with too many properties, and also has the nice benefit of immediately harmonizing the data model with SBOL visual. I'm sure there are cons as well as pros to this approach, but it seems straightforward to me.

cjmyers commented 5 years ago

I can get behind this proposal. I think it is a good compromise, and it makes a fairly clear and clean migration path from SBOL2 to SBOL3.

graik commented 5 years ago

Thanks, Brian for this suggestion! This could work well.

So the class hierarchy could either look like this: ComponentDef ... use for generic assembly of parts (mostly "functional" modules) -> StructureComponent ... add sequence-properties, features annotations, etc for DNA, protein, RNA

IMO better (though that is an independent improvement): Component (renamed from ComponentDefinition) ... use for throwing disparate entities into functional modules -> DNA ... add: strand, topology, sequence, sequence features; restrict sub-parts to DNA -> RNA ... add: sequence, sequence features; restrict sub-parts to RNA -> Protein ... add: sequence, sequence features; restrict sub-parts to Protein -> SmallMolecule ... add: smiles sequence, chemID, others; do not allow sub-parts(?)

On Wed, Sep 4, 2019 at 3:19 AM cjmyers notifications@github.com wrote:

I can get behind this proposal. I think it is a good compromise, and it makes a fairly clear and clean migration path from SBOL2 to SBOL3.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SEPs/issues/80?email_source=notifications&email_token=AAOGZXNSEJRYVNWAIUUHK53QH35HVA5CNFSM4IDSVVP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5Z6GVQ#issuecomment-527688534, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOGZXINE2VYQ22PACJ5MSTQH35HVANCNFSM4IDSVVPQ .

--


Raik Grünberg http://www.raiks.de/contact.html


jakebeal commented 4 years ago

I am withdrawing this in favor of SEP 25 (#58 ), which this was only intended to bridge toward in any case. If SBOL 3 does not proceed efficiently, I reserve the right to reopen.