SynBioDex / SEPs

SBOL Enhancement Proposals
10 stars 16 forks source link

SEP 014 -- Using SBOL to Model the Design-Build-Test Cycle #31

Closed bbartley closed 5 years ago

bbartley commented 7 years ago

Linking experimental data with SBOL designs is becoming critical to a number of important projects. Therefore this SEP introduces a Design-Build-Test data model for SBOL. SEP 14 is here.

cjmyers commented 7 years ago

1) I’m actually open to the idea of productionStatus being a new field, since if we need this for ModuleDefinition, then we would need to add type anyway. That being said, we may want to add type to MD anyway to be able to say if a MD represents a host. For now, I’m guessing we do this with a TaxonId annotation. Could this field be a Boolean? Do we have more than two potential values? Does it need to be 0..* or should it be 0..1?

2) In the Test class, “hashes” should be “hash”, format URL link to NuML is broken. Any idea of a good ontology for formats? I would prefer that this was a required field actually. Not sure if URI Plan class works for protocol or not.

3) Prefer not to have a validation rule for Test not being allowed for designs. I think we should be open to using Test to point to simulation data too.

4) I’m not excited about Test being linked to from CD, since I would prefer they indicate the host/environment of the test using a MD. However, I imagine that we might need to allow this for those reluctant to use MD class. However, I’m really unsure what it means to test a CD independent of a host/environment. On second thought, maybe this is a good way to motivate the MD class to those working at the CD level.

graik commented 7 years ago

Hi all,

I very much like this proposal. I have some concern though about attaching experimental data to ComponentDef directly.

On Thu, Jul 13, 2017 at 10:06 PM, cjmyers notifications@github.com wrote:

1.

I’m actually open to the idea of productionStatus being a new field, since if we need this for ModuleDefinition, then we would need to add type anyway. That being said, we may

Very definitely a new field is much better than using type. Indeed, I think it would be a very good idea to also get the topology information (double / single stranded, circular) into its own field. Arguably, this is a even more fundamental property than production status or even role. Something to keep in mind for version 3.

1.

I’m not excited about Test being linked to from CD, since I would prefer they indicate the host/environment of the test using a MD. However, I imagine that we might need to allow this for those reluctant to use MD class. However, I’m really unsure what it means to test a CD independent of a host/environment. On second thought, maybe this is a good way to motivate the MD class to those working at the CD level.

This is a major concern. In my own practice, it is absolutely crucial that, e.g., sequencing data are NOT directly linked to a plasmid record but are instead linked to a SAMPLE object. SAMPLE.content then links to either ComponentDefinition directly (a naked DNA sample) or it links to a CELL which links to ComponentDefinition (a clone of a cell containing a plasmid). Sample obviously links to a LOCATION where I can find it and it has some basic history of how it was derived from other samples.

An experimentalist needs to know which sample an experiment was performed on. Each clone (of cells or DNA derived from those cells) potentially has unknown mutations, samples become corrupted or mixed up etc pp and we may need to re-validate them or want to re-use them later. For publication, this level of detail may be stripped away. So in this particular case, direct attachment of TEST to Component might make sense.

In any case, ComponentDefinition should not become mixed up with this concept of an experimental sample. It already is too broadly defined as it stands. If anyone is interested, all this is implemented in rotmic and has been in use and served us well for several years. So please go check out the data model.

Greetings Raik

--


Raik Grünberg http://www.raiks.de/contact.html


cjmyers commented 7 years ago

I was thinking a bit more about the Test class, and I thought of a potential problem. The current idea assumes one protocol produces one data file. I think that may not be true. One experiment I could imagine could produce more than one set of data, or certainly multiple representations of the data. For example, you might want to attach to a “Test”, a link to the raw data, a link to the processed data, and a link to a graphical representation of the processed data. In order to address this issue, I considered perhaps what we want to do is:

1) Formalize the Attachment class that SynBioHub is using as proper SBOL, and allow all TopLevel (or Identified) objects to be able to reference Attachments. 2) Reduce the Test class to a protocol and it would then be able to have 0 or more Attachments for the data.

However, if Test is only a protocol, then I was wondering if PROV-O actually is the solution. Namely, we could do something like:

1) Link a ModuleDefinition to an Attachment that includes raw data, the Attachment would have a wasGeneratedBy pointing to an Activity that references the protocol used to generate this data.

2) Have a 2nd Attachment which is the processed data that has a wasGeneratedBy pointing to the Activity processes the raw data.

3) Have a 3rd Attachment which is the graphical representation that has a wasGeneratedBy pointing to the Activity that graphs the processed data.

Essentially use PROV-O to stitch the entire design-build-test flow together.

bbartley commented 6 years ago

Responding to Raik first.

Sample... has some basic history of how it was derived from other samples.

Assuming this SEP is enacted, sample history can in fact be captured using the PROVO classes which are already part of the data model.

In my own practice, it is absolutely crucial that, e.g., sequencing data are NOT directly linked to a plasmid record but are instead linked to a SAMPLE object.

This is one of the motivations for this SEP, and your experience corroborates that of myself and the other authors. It is necessary to distinguish what the user intended to build (design) from what the user actually built (build). In this SEP, we represent a sample by using a ComponentDefinition with productionStatus:build.

An experimentalist needs to know which sample an experiment was performed on. Each clone (of cells or DNA derived from those cells) potentially has unknown mutations, samples become corrupted or mixed up etc pp and we may need to re-validate them or want to re-use them later.

What you describe is encompassed by this SEP. See Example 1. A Test can be associated with a ComponentDefinition representing a clone.

In any case, ComponentDefinition should not become mixed up with this concept of an experimental sample. It already is too broadly defined as it stands.

There are two main reasons for using a ComponentDefinition to represent a sample:

Thanks, Bryan

bbartley commented 6 years ago

Now responding to Chris

One experiment I could imagine could produce more than one set of data, or certainly multiple representations of the data.

Agreed.

Formalize the Attachment class that SynBioHub is using as proper SBOL, and allow all TopLevel (or Identified) objects to be able to reference Attachments.

Specification of the Attachment class goes beyond the scope of this SEP. However, this SEP is compatible with that vision. See Relation to Other Proposals for discussion of Attachments and associated metadata.

Reduce the Test class to a protocol and it would then be able to have 0 or more Attachments for the data.

The latest revision to this SEP does essentially this. All metadata has been stripped from the Test class. Currently, a Test refers directly to external files through its attachments property. No metadata is specied. Tooling will have to infer the data type of the attachment through a file extension, but in the short term this should be workable.

However, the attachments property could be easily co-opted in the future to refer to Attachment objects which contain important metadata about an external file link.

graik commented 6 years ago

Hi Brian,

I put sbol-dev in CC because this is a general design issue that others should look at, too.

I very much agree with adding a builtstatus field to ComponentDefinition. And especially when it comes to publication, it may often be the most straightforward to directly attach experimental info to a ComponenDefinition.

But ComponentDefintion should not be made to represent a physical sample. It should also not be made to represent a clone. ComponentDefinition is meant to represent a molecule (in 99% of cases) or (in 100%) a part of a molecular design. This is a completely different concept than the representation of a tube in some freezer.

How do you want to encode what buffer a DNA molecule is stored in? How do you want to encode the concentration of it? How do you want to encode the fact that there is a mixture of molecules (each with its own concentration) within this sample? None of that should be found in a ComponentDefinition record unless you want to cause maximal confusion. Vice versa, what would be the meaning of a sequence feature attached to a glycerol stock? This is different territory and we may choose not to deal with it but we should not further broaden the use of ComponentDefinition just because we cannot agree on adding a new class.

My suggestion is that your SEP should clearly state that ComponentDefinition is NOT meant to represent a sample or a clone. We could then draft a further SEP to define a sbol.Sample class. At this point, most of the sub-fields should be left undefined because needs are quite different and often sample information stays in-house. But at least programmers would know where to attach this kind of information to.

Greetings Raik

On Thu, Jul 20, 2017 at 6:15 AM, bbartley notifications@github.com wrote:

Responding to Raik first.

Sample... has some basic history of how it was derived from other samples.

Assuming this SEP is enacted, sample history can in fact be captured using the PROVO classes which are already part of the data model.

In my own practice, it is absolutely crucial that, e.g., sequencing data are NOT directly linked to a plasmid record but are instead linked to a SAMPLE object.

This is one of the motivations for this SEP, and your experience corroborates that of myself and the other authors. It is necessary to distinguish what the user intended to build (design) from what the user actually built (build). In this SEP, we represent a sample by using a ComponentDefinition with productionStatus:build.

An experimentalist needs to know which sample an experiment was performed on. Each clone (of cells or DNA derived from those cells) potentially has unknown mutations, samples become corrupted or mixed up etc pp and we may need to re-validate them or want to re-use them later.

What you describe is encompassed by this SEP. See Example 1 https://github.com/SynBioDex/SEPs/blob/master/sep_014.md#example. A Test can be associated with a ComponentDefinition representing a clone.

In any case, ComponentDefinition should not become mixed up with this concept of an experimental sample. It already is too broadly defined as it stands.

There are two main reasons for using a ComponentDefinition to represent a sample:

  • In some cases, it may be necessary to use SequenceAnnotations or Components to describe the substructure of a sample, especially when the sample does not match the target. Therefore it is advantageous to use ComponentDefinitions to represent both a design and a build (sample). For further discussion, see the third paragraph under Production Status https://github.com/SynBioDex/SEPs/blob/master/sep_014.md#indicating-the-production-status .
  • The consensus sequence for a given plasmid clone or sample is represented by the Sequence object that is associated with the ComponentDefinition representing the build. See Example 1 https://github.com/SynBioDex/SEPs/blob/master/sep_014.md#example. The target sequence is represented by a Sequence associated with a design.

Thanks, Bryan

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SEPs/issues/31#issuecomment-316591922, or mute the thread https://github.com/notifications/unsubscribe-auth/ABxs3YDy53Ro8LgButPtkuhgTYiSaTWnks5sPtRagaJpZM4OTE4a .

--


Raik Grünberg http://www.raiks.de/contact.html


cjmyers commented 6 years ago

Hi Raik,

I agree with you. We did discuss this at Harmony, and while a ComponentDefinition could represent an isolated DNA molecule (plasmid), it would not represent it within its context (tube, host, etc.). This would be accomplished using ModuleDefinition class. The ModuleDefinition could represent a tube or a strain. For example, if the ModuleDefinition is representing a host cell, it could then include the Component for the plasmid that has been transformed into this host.

Cheers,

Chris

On Jul 20, 2017, at 9:47 AM, Raik Grünberg notifications@github.com wrote:

Hi Brian,

I put sbol-dev in CC because this is a general design issue that others should look at, too.

I very much agree with adding a builtstatus field to ComponentDefinition. And especially when it comes to publication, it may often be the most straightforward to directly attach experimental info to a ComponenDefinition.

But ComponentDefintion should not be made to represent a physical sample. It should also not be made to represent a clone. ComponentDefinition is meant to represent a molecule (in 99% of cases) or (in 100%) a part of a molecular design. This is a completely different concept than the representation of a tube in some freezer.

How do you want to encode what buffer a DNA molecule is stored in? How do you want to encode the concentration of it? How do you want to encode the fact that there is a mixture of molecules (each with its own concentration) within this sample? None of that should be found in a ComponentDefinition record unless you want to cause maximal confusion. Vice versa, what would be the meaning of a sequence feature attached to a glycerol stock? This is different territory and we may choose not to deal with it but we should not further broaden the use of ComponentDefinition just because we cannot agree on adding a new class.

My suggestion is that your SEP should clearly state that ComponentDefinition is NOT meant to represent a sample or a clone. We could then draft a further SEP to define a sbol.Sample class. At this point, most of the sub-fields should be left undefined because needs are quite different and often sample information stays in-house. But at least programmers would know where to attach this kind of information to.

Greetings Raik

On Thu, Jul 20, 2017 at 6:15 AM, bbartley notifications@github.com wrote:

Responding to Raik first.

Sample... has some basic history of how it was derived from other samples.

Assuming this SEP is enacted, sample history can in fact be captured using the PROVO classes which are already part of the data model.

In my own practice, it is absolutely crucial that, e.g., sequencing data are NOT directly linked to a plasmid record but are instead linked to a SAMPLE object.

This is one of the motivations for this SEP, and your experience corroborates that of myself and the other authors. It is necessary to distinguish what the user intended to build (design) from what the user actually built (build). In this SEP, we represent a sample by using a ComponentDefinition with productionStatus:build.

An experimentalist needs to know which sample an experiment was performed on. Each clone (of cells or DNA derived from those cells) potentially has unknown mutations, samples become corrupted or mixed up etc pp and we may need to re-validate them or want to re-use them later.

What you describe is encompassed by this SEP. See Example 1 https://github.com/SynBioDex/SEPs/blob/master/sep_014.md#example. A Test can be associated with a ComponentDefinition representing a clone.

In any case, ComponentDefinition should not become mixed up with this concept of an experimental sample. It already is too broadly defined as it stands.

There are two main reasons for using a ComponentDefinition to represent a sample:

  • In some cases, it may be necessary to use SequenceAnnotations or Components to describe the substructure of a sample, especially when the sample does not match the target. Therefore it is advantageous to use ComponentDefinitions to represent both a design and a build (sample). For further discussion, see the third paragraph under Production Status https://github.com/SynBioDex/SEPs/blob/master/sep_014.md#indicating-the-production-status .
  • The consensus sequence for a given plasmid clone or sample is represented by the Sequence object that is associated with the ComponentDefinition representing the build. See Example 1 https://github.com/SynBioDex/SEPs/blob/master/sep_014.md#example. The target sequence is represented by a Sequence associated with a design.

Thanks, Bryan

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SEPs/issues/31#issuecomment-316591922, or mute the thread https://github.com/notifications/unsubscribe-auth/ABxs3YDy53Ro8LgButPtkuhgTYiSaTWnks5sPtRagaJpZM4OTE4a .

--


Raik Grünberg http://www.raiks.de/contact.html


— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SEPs/issues/31#issuecomment-316624767, or mute the thread https://github.com/notifications/unsubscribe-auth/ADWD92TBYDhNwKeFrk_vNswychF0_At3ks5sPwX2gaJpZM4OTE4a.

bbartley commented 6 years ago

How do you want to encode what buffer a DNA molecule is stored in? How do you want to encode the concentration of it? How do you want to encode the fact that there is a mixture of molecules (each with its own concentration) within this sample?

That is beyond the scope of this SEP. These issues are important, but we won't quickly agree on how to represent a sample. The issue addressed by this SEP is more fundamental -- does a ComponentDefinition represent a concept in the user's head, or is it actually describing the structure of an entity in the real world. Design, Build, Test. What stage of the synthetic biology life cycle are we in?

(Also, I think I'm using the word clone slightly different than you. I'm using it to refer to a plasmid clone, such as you might isolate during the sequence verification process. I'm not using it to refer to a cell clone or freezer stock)

Thanks Bryan

cjmyers commented 6 years ago

It looks like the UML for Test has not been updated yet.

I think Attachment class should be included in this SEP. It is a prerequisite to this being useful, and it is a simple class, so it would be nice to include it. Also, the “attachments” property should be added to TopLevel and not just Test. I’m not keen on Test having this property, since it will be redundant with the TopLevel property.

On Jul 20, 2017, at 6:16 AM, bbartley notifications@github.com wrote:

Now responding to Chris

One experiment I could imagine could produce more than one set of data, or certainly multiple representations of the data.

Agreed.

Formalize the Attachment class that SynBioHub is using as proper SBOL, and allow all TopLevel (or Identified) objects to be able to reference Attachments.

Specification of the Attachment class goes beyond the scope of this SEP. However, this SEP is compatible with that vision. See Relation to Other Proposals https://github.com/SynBioDex/SEPs/blob/master/sep_014.md#otherproposals for discussion of Attachments and associated metadata.

Reduce the Test class to a protocol and it would then be able to have 0 or more Attachments for the data.

The latest revision to this SEP does essentially this. All metadata has been stripped from the Test class. Currently, a Test refers directly to external files through its attachments property. No metadata is specied. Tooling will have to infer the data type of the attachment through a file extension, but in the short term this should be workable.

However, the attachments property could be easily co-opted in the future to refer to Attachment objects which contain important metadata about an external file link.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SEPs/issues/31#issuecomment-316592050, or mute the thread https://github.com/notifications/unsubscribe-auth/ADWD95MDrZnZG7q4DrRxxSclso5YIIQlks5sPtShgaJpZM4OTE4a.

bbartley commented 6 years ago

UML updated.

Perhaps others can comment on whether an Attachment class should be included in this SEP.

jamesamcl commented 6 years ago

The problem is that Attachments are not simple. We can't just take the synbiohub idea of Attachment and formalize it into SBOL directly. For example, synbiohub attachments don't provide any information about where to retrieve the attachment from, only the file hash. We also need to decide how to represent the type of the file (e.g. mime types), etc.

Also, in synbiohub Attachments can be attached to absolutely anything, so it's not just related to the Test class, which I think makes it beyond the scope of this SEP.

cjmyers commented 6 years ago

James: are you willing to put forward an SEP for attachments then in short order. We should get that one approved before approving the experimental data one, since it will depend on it.

On Jul 20, 2017, at 11:14 PM, James Alastair McLaughlin notifications@github.com wrote:

The problem is that Attachments are not simple. We can't just take the synbiohub idea of Attachment and formalize it into SBOL directly. For example, synbiohub attachments don't provide any information about where to retrieve the attachment from, only the file hash. We also need to decide how to represent the type of the file (e.g. mime types), etc.

Also, in synbiohub Attachments can be attached to absolutely anything, so it's not just related to the Test class, which I think makes it beyond the scope of this SEP.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SEPs/issues/31#issuecomment-316832106, or mute the thread https://github.com/notifications/unsubscribe-auth/ADWD93r7zMOjlbkO-YIELbuCHhh4FKt1ks5sP8NEgaJpZM4OTE4a.

graik commented 6 years ago

Hi Chris,

I agree with you. We did discuss this at Harmony, and while a

ComponentDefinition could represent an isolated DNA molecule (plasmid), it would not represent it within its context (tube, host, etc.). This would be accomplished using ModuleDefinition class. The ModuleDefinition could represent a tube or a strain. For example, if the ModuleDefinition is representing a host cell, it could then include the Component for the plasmid that has been transformed into this host.

ModuleDefinition is meant to group design elements together for purposes of abstraction or so that we can say something that applies to the whole rather than the parts. This has still not much to do with an eppendorf tube in some freezer. Sample management is a logistical problem, not a design problem. We are complicating the life of both application programmers and library developers if our classes have several unrelated purposes. That means the programmer has to untangle the actual meaning of an object from its fields and sub-fields. Perhaps a sample class could be derived from ModuleDefinition but it certainly needs its own class.

Greetings Raik

Cheers,

Chris

On Jul 20, 2017, at 9:47 AM, Raik Grünberg notifications@github.com wrote:

Hi Brian,

I put sbol-dev in CC because this is a general design issue that others should look at, too.

I very much agree with adding a builtstatus field to ComponentDefinition. And especially when it comes to publication, it may often be the most straightforward to directly attach experimental info to a ComponenDefinition.

But ComponentDefintion should not be made to represent a physical sample. It should also not be made to represent a clone. ComponentDefinition is meant to represent a molecule (in 99% of cases) or (in 100%) a part of a molecular design. This is a completely different concept than the representation of a tube in some freezer.

How do you want to encode what buffer a DNA molecule is stored in? How do you want to encode the concentration of it? How do you want to encode the fact that there is a mixture of molecules (each with its own concentration) within this sample? None of that should be found in a ComponentDefinition record unless you want to cause maximal confusion. Vice versa, what would be the meaning of a sequence feature attached to a glycerol stock? This is different territory and we may choose not to deal with it but we should not further broaden the use of ComponentDefinition just because we cannot agree on adding a new class.

My suggestion is that your SEP should clearly state that ComponentDefinition is NOT meant to represent a sample or a clone. We could then draft a further SEP to define a sbol.Sample class. At this point, most of the sub-fields should be left undefined because needs are quite different and often sample information stays in-house. But at least programmers would know where to attach this kind of information to.

Greetings Raik

On Thu, Jul 20, 2017 at 6:15 AM, bbartley notifications@github.com wrote:

Responding to Raik first.

Sample... has some basic history of how it was derived from other samples.

Assuming this SEP is enacted, sample history can in fact be captured using the PROVO classes which are already part of the data model.

In my own practice, it is absolutely crucial that, e.g., sequencing data are NOT directly linked to a plasmid record but are instead linked to a SAMPLE object.

This is one of the motivations for this SEP, and your experience corroborates that of myself and the other authors. It is necessary to distinguish what the user intended to build (design) from what the user actually built (build). In this SEP, we represent a sample by using a ComponentDefinition with productionStatus:build.

An experimentalist needs to know which sample an experiment was performed on. Each clone (of cells or DNA derived from those cells) potentially has unknown mutations, samples become corrupted or mixed up etc pp and we may need to re-validate them or want to re-use them later.

What you describe is encompassed by this SEP. See Example 1 https://github.com/SynBioDex/SEPs/blob/master/sep_014.md#example. A Test can be associated with a ComponentDefinition representing a clone.

In any case, ComponentDefinition should not become mixed up with this concept of an experimental sample. It already is too broadly defined as it stands.

There are two main reasons for using a ComponentDefinition to represent a sample:

Thanks, Bryan

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SEPs/issues/31#issuecomment-316591922, or mute the thread https://github.com/notifications/unsubscribe-auth/ ABxs3YDy53Ro8LgButPtkuhgTYiSaTWnks5sPtRagaJpZM4OTE4a .

--


Raik Grünberg http://www.raiks.de/contact.html


— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/SynBioDex/SEPs/issues/31#issuecomment-316624767>, or mute the thread https://github.com/notifications/unsubscribe- auth/ADWD92TBYDhNwKeFrk_vNswychF0_At3ks5sPwX2gaJpZM4OTE4a.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SEPs/issues/31#issuecomment-316672344, or mute the thread https://github.com/notifications/unsubscribe-auth/ABxs3STOcj_DJOgU0MQQFWMq94Mr06x0ks5sPzX-gaJpZM4OTE4a .

--


Raik Grünberg http://www.raiks.de/contact.html


bbartley commented 6 years ago

Hi Raik, Chris

What we found at HARMONY is that we don't have a clear consensus about how to represent samples. Chris is not alone in arguing that ModuleDefinition might be used to represent details about a sample. That is why I deliberately chose to limit the scope of this SEP. It's purpose is not to describe samples in detail. However, it does support sequence verification workflows. From my point of view, that is fundamental.

Best Bryan

bbartley commented 6 years ago

Hi Chris,

James: are you willing to put forward an SEP for attachments then in short order. We should get that one approved before approving the experimental data one, since it will depend on it.

From your point of view, why is an Attachment class REQUIRED in order to implement this SEP?

I see this current revision as entirely workable, with an easy migration path towards adding Attachments to the data model in a future proposal.

@jakebeal I think your input might be a critical tie breaker on this. Do you see the current SEP as workable, or would you like us to work out the semantics of Attachments?

jakebeal commented 6 years ago

Here is my take. I believe that SBOL's value comes primarily as an "integration hub" for linking different aspects of biological engineering workflows. This means that, while we do not want to get down into the weeds of LIMS systems, metrology, and experimental data exchange, we do need to be able to represent the critical engineering decisions associated with them.

To this end, I see a high degree of value in being able to distinguish between "idealized" engineered artifacts (design) and realized instances. Critically, PROV-O lets us link these cleanly, as well as potentially attaching protocol descriptions to explain how we got from a design to a sample. PROV-O also lets us cleanly link an intended design to a realized design.

I also see it as worthwhile to allow this distinction to be attached to both ModuleDefinition and ComponentDefinition. The key point of this distinction is not the fine details of what happens in the lab (I agree those are best left to LIMS systems), but to have a clean representation of critical engineering decisions. For example, consider Raik's example of DNA being stored in a particular liquid buffer. We should be able to represent this in two different ways:

  1. Here is some DNA, stored in a way we expect to not have to care about as long as you do it "normally." Here the design would be represented by a ComponentDefinition.
  2. Here is some DNA whose storage medium is an unusual and notable part of the design. Here the design would be represented by a ModuleDefinition, which includes the media as a FunctionalComponent.

Thinking about it from this perspective, my expectation is that when it comes to physical samples, ModuleDefinition is more likely to be useful for talking about experiments with actual cells, while ComponentDefinition is more likely to be used for talking about a construction process and verification.

So far, so good, and I think without any controversy.

As I am working out more use cases, however, I am becoming uncomfortable with the particulars of this proposal, and my discomforts are leading me to an alternative that I think is still quite simple. Here are some of my sources of discomfort:

These are pointing me toward a conclusion that while I think the (extremely simple) information we're trying to encode is the right information, we need to make an adjustment in the representation. Since this comment is getting super-long, I will follow with another comment with my new proposal.

jakebeal commented 6 years ago

Here is my alternate proposal, which tries to capture the same information with the following two differences:

  1. A cleaner distinction between intention and reality
  2. Designs aren't forked until you actually know they differ from their original.

    New classes, with their fields:

    • Sample: this represents something physical
    • field: specification [1]: link to a ComponentDefinition or ModuleDefinition
    • field: data [0 .. *]: links to Data objects

In my proposal, the Sample class plays exactly the same role as the productionStatus field in the current proposal. A Sample is equivalent to a derived ComponentDefinition / ModuleDefinition with its productionStatus set to build. The difference is that we don't have to copy all of the sub-structure of the CD/MD, just link to it. If the reality turns out to be different, then we can fork the CD/MD then, using PROV-O to link just as we would have before. We can also use PROV-O to link the Sample to its intended CD/MD, in order to represent the whole process: "Sample X was supposed to be an instance to ModuleDefinition A, but instead I ended up with ModuleDefinition A'"

The data field is identical to tests, just renamed to follow my proposed adjustment to that class.

The fields of this class (and Data, below) are modeled exactly after Model. At some later point we may add more fields, but not in this proposal. The idea is that Protocol gets used as part of PROV-O links talking about the derivation of a Sample from a ComponentDefinition or ModuleDefinition or of one Sample from another Sample.

Mostly there I just renamed Test to expand the notion that data can come from any stage of sample manipulation, not just a "testing" stage. We don't need the protocol field because it can be embedded with PROV-O if desired, just as for the samples. I also propose dropping the fields focused on data transport.

graik commented 6 years ago

Sample is not such a good name then. It's more of an "experimental realization" . So perhaps "Experiment" or "Implementation"?

On Jul 21, 2017 22:44, "Jacob Beal" notifications@github.com wrote:

Here is my alternate proposal, which tries to capture the same information with the following two differences:

  1. A cleaner distinction between intention and reality
  2. Designs aren't forked until you actually know they differ from their original.

New classes, with their fields:

In my proposal, the Sample class plays exactly the same role as the productionStatus field in the current proposal. A Sample is equivalent to a derived ComponentDefinition / ModuleDefinition with its productionStatus set to build. The difference is that we don't have to copy all of the sub-structure of the CD/MD, just link to it. If the reality turns out to be different, then we can fork the CD/MD then, using PROV-O to link just as we would have before. We can also use PROV-O to link the Sample to its intended CD/MD, in order to represent the whole process: "Sample X was supposed to be an instance to ModuleDefinition A, but instead I ended up with ModuleDefinition A'"

The data field is identical to tests, just renamed to follow my proposed adjustment to that class.

The fields of this class (and Data, below) are modeled exactly after Model. At some later point we may add more fields, but not in this proposal. The idea is that Protocol gets used as part of PROV-O links talking about the derivation of a Sample from a ComponentDefinition or ModuleDefinition or of one Sample from another Sample.

Mostly there I just renamed Test to expand the notion that data can come from any stage of sample manipulation, not just a "testing" stage. We don't need the protocol field because it can be embedded with PROV-O if desired, just as for the samples. I also propose dropping the fields focused on data transport.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SEPs/issues/31#issuecomment-317107894, or mute the thread https://github.com/notifications/unsubscribe-auth/ABxs3fM_1-sPn30zjI-f4eCJ0sCUSBHNks5sQQ2dgaJpZM4OTE4a .

jakebeal commented 6 years ago

I'm not deeply attached to the name. Let's talk about the data model first, however, and then make sure we get the best synonym.

graik commented 6 years ago

I think this could work. You are saying 'this is an Implementation of (Link to CD or module)", this is what we did and here are the data recorded with it or validating it.

On Jul 21, 2017 23:04, "Jacob Beal" notifications@github.com wrote:

I'm not deeply attached to the name. Let's talk about the data model first, however, and then make sure we get the best synonym.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SEPs/issues/31#issuecomment-317112144, or mute the thread https://github.com/notifications/unsubscribe-auth/ABxs3XTER7YhaVzcwmnKirii99vXyX1fks5sQRJKgaJpZM4OTE4a .

jakebeal commented 6 years ago

Exactly.

bbartley commented 6 years ago

Hi Jake,

I'm not sure all your criticisms are fair, and therefore I don't see a need for a new proposal. Please see my response to your comments below.

...until I do so the "physical" ModuleDefinition is still really an intention and not a known reality.

I don't understand this. If something is "physical", it is real.

Sometimes we build something, it works, then we sequence it and find out what worked was actually a beneficial mutant. We then add that to the collection of ComponentDefinitions, where it goes from being physical back to being a design.

I feel like this use case is perfectly accommodated by the current SEP. I thought I had explained it in the text, but I see now I just half explained it. Anyway, I don't think this is really a problem, and I can update the SEP to explain this in more detail.

In your proposal, you state:

The difference is that we don't have to copy all of the sub-structure of the CD/MD, just link to it.

This is already explicitly stated in the current proposal. There is also a pretty clear UML diagram of this in Example 1:

For a given design many builds may be generated. In general, the design should serve as a reference to which builds are compared either for quality control (sequence verification) or comparison of observed versus expected output (experimental data vs. model predictions). Therefore, as a best practice, a user SHOULD NOT recursively copy all the Components and Modules which describe the compositional hierarchy of a design over to each new build generated, as this would be inefficient and redundant. A build object SHOULD be a simple ComponentDefinition or ModuleDefinition containing no subparts

A cleaner distinction between intention and reality

This argument has little weight with me now. Some of us argued at HARMONY for taking a more explicit, knowledge-representation approach. There were 3 possibilities discussed:

  1. Derive Design and Build from CD. That was the original proposal.
  2. Add new TopLevel classes, for Design and Build, and reference a CD or MD from them. This is similar to your approach here with Sample.
  3. Use an ontology term, because in the future we might want to add more detailed stages other than design and build.

We went with 3, which was a concession to you, Jake! Now it appears we are back to something like option 2.

I just renamed Test to expand the notion that data can come from any stage of sample manipulation

Can you provide an example of sample manipulation that would not qualify as a Test?

One thing I would like to emphasize. Our current proposal defines clear semantics about where data should be attached. A Test class represents empirical data. A Model represents simulation data. These each occupy a special place in the Design-Build-Test-Learn cycle (see Example 3). I think it is very important that Test and Model remain conceptually distinct and explicit. What would make sense to me is deriving both Test and Model from an abstract Data class.

Furthermore, we this SEP has another clear semantic about where data should be attached. Structural data (including sequence verification data) should be attached to CD. Characterization data should be attached to MD. This means client tooling has a very good idea where to look for certain kinds of data. I feel like this is an important consideration, since we seem to be discussing adding Data or Attachments to arbitrary SBOL objects.

Regards, Bryan

jakebeal commented 6 years ago

Let me focus on the heart of my concern, which is my discomfort with exactly this part of the proposal:

Therefore, as a best practice, a user SHOULD NOT recursively copy all the Components and Modules which describe the compositional hierarchy of a design over to each new build generated, as this would be inefficient and redundant. A build object SHOULD be a simple ComponentDefinition or ModuleDefinition containing no subparts

With this best practice, we would be recommending effectively using a ComponentDefinition or ModuleDefinition only as a pointer to another "master" copy, by means of the PROV-O link. That is a very different usage than we have ever had previously. Critically, the ComponentDefinition (or, equivalently ModuleDefinition) is no longer "self-contained," in the sense that you can find out what it is just by looking at child Components, Sequences, etc. Moreover, wasDerivedFrom can have multiple links, per SEP012 --- what does it mean if we link an "empty" ComponentDefinition to multiple sources by wasDerivedFrom? How do we even reason about this or effectively detect it? Is this new usage limited only to "design"/"build" relations or can it relate between two designs as well?

I know that my position was different last month, but as I've been working through my use cases, I've been getting progressively more uncomfortable with the repurposing of ComponentDefinition and ModuleDefinition as a sort of proxy pointer. I feel that this is a larger change of the meaning of the data model than is being accounted for, but if we prohibit this usage, then we have lots of cloning and the problems of describing something before we can verify it.

This is the core of my concerns, and I believe this issue needs to be addressed one way or another.

bbartley commented 6 years ago

Hi Jake,

With this best practice, we would be recommending effectively using a ComponentDefinition or ModuleDefinition only as a pointer to another "master" copy, by means of the PROV-O link

This is not the only reason we are using CD or MD to represent builds. The other reason we are using CD and MD to represent builds (discussed in the SEP, and the comments above) is as follows:

With regards to concerns about the wasDerivedFrom field, indeed there is ambiguity in how a wasDerivedFrom property may be interpreted. Those I think are deeper issues that go beyond the scope of this SEP. The examples you cite sound like edge cases to me. Also, nothing in this proposal is outside the recommended usage of wasDerivedFrom. The W3C spec is as follows: "A derivation is a transformation of an entity into another, an update of an entity resulting in a new one, or the construction of a new entity based on a pre-existing entity."

In earlier versions of the SEP, these ambiguities were not an issue, because we favored explicit naming of classes, similar to the approach you took here with the Sample class. Also, I'd like to point out that specification of your Sample class...

Sample: this represents something physical
field: specification [1]: link to a ComponentDefinition or ModuleDefinition
field: data [0 .. *]: links to Data objects

...is pretty much identical to the Build class in the original proposal! Except design has changed to specification, test has changed to data, and Build has changed to Sample. So much for design-build-test! This is very ironic to me.

graik commented 6 years ago

Hi Brian, Hi Jake,

I think it is perfectly fine to change opinions -- that's the difference between open discussion and ideological debate ;) So let's please remember that we are all looking for a good solution here and fair or unfair has nothing to do with it. Please let's be nice to each other and stay focused on solving the problem.

The next thing to remark is that we have essentially two proposals now but only one of them is documented in a SEP and the discussion thread has become long enough for others to get lost. So I propose that Jake and I write up an alternative SEP. Brian, is your original proposal with a special class still around? We should obviously have a careful look at it.

I still think it is a good idea to have a "built status" or "production status" property directly in Component and Module (-definition). The cases I would distinguish are: (1) design -- not implemented yet (2) under construction (building)-- being implemented which can easily take months (3) built completed

Once built, the molecule or system (e.g. bacterial clone or cell line) will undergo testing. One or more validation experiments will be run. In the easy case of cloning, validation (sequencing) may show that the plasmid is not what we want. Then this result should still be attached to the same component making clear that "the build failed". In the very rare event that you think the built failed but the result is useful anyway, you can create a new version of the ComponentDefinition (prov-O derived from the old one) and attach the same experimental result.

There is other possible outcomes, namely two validation experiments may give conflicting results or the results are incomplete. If you go from plasmid construction to circuits, there is no clear-cut distinction between "success" and "failure" anyway but you still want to be able to attach results of some sort. I would argue we should first focus on "Build Validation" experiments where there is a relatively clear-cut interpretation of results (implemented as designed or not). "Functional Validation" is another, related, problem.

So what we probably agree on is that (1) we need some way to put a status on a CD and MD. (2) we need a representation of a "Validation Experiment" (Brian calls it Test, I would prefer a less generic name) with links to protocols and result data

And here comes the problem and the disagreement: Experiments are not performed on abstract designs but on actual physical batches. Typical examples are cell clones after plasmid construction, or cell culture batches after a genome engineering experiment or one particular batch of enzyme mixed with one particular batch of cell-free extract. For cell clones, there is even "batches of batches" that may start to differ or may become corrupted by virus/phage infection etc. SBOL has no concept or representation for any of this because this is not in the domain of design any longer.

The suggestion coming out of your HARMONY discussion is to re-use shallow copies of ComponentDefinition or ModuleDefinition to represent physical batches (/clones /cell lines /samples) in the lab. That's, in my and Jake's opinion, a bad choice. I don't want to re-iterate the arguments here but let me just say that this kind of experimental / logistical information is in my eyes out of scope for classes describing a design.

Greetings and have a nice weekend everyone, Raik

On Sat, Jul 22, 2017 at 8:26 AM, bbartley notifications@github.com wrote:

Hi Jake,

With this best practice, we would be recommending effectively using a ComponentDefinition or ModuleDefinition only as a pointer to another "master" copy, by means of the PROV-O link

This is not the only reason we are using CD or MD to represent builds. The other reason we are using CD and MD to represent builds (discussed in the SEP, and the comments above) is as follows:

  • In some cases, it may be necessary to use SequenceAnnotations or Components to describe the substructure of a sample or annotate it, especially when the sample does not match the target. This is similar to a use case you cited earlier: Sometimes we build something, it works, then we sequence it and find out what worked was actually a beneficial mutant. We then add that to the collection of ComponentDefinitions, where it goes from being physical back to being a design.
  • The consensus sequence for a given plasmid clone or sample is represented by the Sequence object that is associated with the ComponentDefinition representing the build. See Example 1. The target sequence is represented by a Sequence associated with a design.

With regards to concerns about the wasDerivedFrom field, indeed there is ambiguity in how a wasDerivedFrom property may be interpreted. Those I think are deeper issues that go beyond the scope of this SEP. The examples you cite sound like edge cases to me. Also, nothing in this proposal is outside the recommended usage of wasDerivedFrom. The W3C spec is as follows: "A derivation is a transformation of an entity into another, an update of an entity resulting in a new one, or the construction of a new entity based on a pre-existing entity."

In earlier versions of the SEP, these ambiguities were not an issue, because we favored explicit naming of classes, similar to the approach you took here with the Sample class. Also, I'd like to point out that specification of your Sample class...

Sample: this represents something physical field: specification [1]: link to a ComponentDefinition or ModuleDefinition field: data [0 .. *]: links to Data objects

...is pretty much identical to the Build class in the original proposal! Except design has changed to specification, test has changed to data, and Build has changed to Sample. So much for design-build-test! This is very ironic to me.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SEPs/issues/31#issuecomment-317159908, or mute the thread https://github.com/notifications/unsubscribe-auth/ABxs3YHA0Gdaj2G_w1p82pzgA4TuAXw0ks5sQZX5gaJpZM4OTE4a .

--


Raik Grünberg http://www.raiks.de/contact.html


jakebeal commented 6 years ago

Hi, Brian:

With this best practice, we would be recommending effectively using a ComponentDefinition or ModuleDefinition only as a pointer to another "master" copy, by means of the PROV-O link

This is not the only reason we are using CD or MD to represent builds. [snip]

I agree, and I also agree that we must be able to describe the contents of samples. However, there is more than one way to achieve this. The fact that CD and MD are convenient in other ways does not affect my concerns about the change of semantics needed in order to use an empty CD/MD as a pointer.

As I approach the question of solutions, it is indeed true that my thoughts do have a good deal of commonality with your original proposal. I would not view this as reverting, but as "spiraling up" to a view that includes the good parts of both the old and new proposal. The key differences in what I am proposing are (again, not worrying about names):

There are other minor differences, but indeed, I have come around to the view expressed by both yourself and Raik that it is valuable to have not just a field but a whole separate class to represent a physical object, so that we can have lightweight "pointers" for representing large numbers of samples.

bbartley commented 6 years ago

I think it is perfectly fine to change opinions -- that's the difference between open discussion and ideological debate ;) So let's please remember that we are all looking for a good solution here and fair or unfair has nothing to do with it.

No problem with changing opinions. However, this feels like we are going in circles instead of converging. I hope that we are indeed "spiraling up" as Jake said. At this stage, an entirely new proposal might solve some issues, but at the same time it will likely introduce new issues, or worse re-introduce old issues which have already been discussed.

Fundamentally, a CD represents structure. IMHO, I should be able to use a CD to describe real, physical, manufactured structures, as well as theoretical, conceptual structures. When we start talking about Samples then the issue gets convoluted. I'm not trying to use CD to describe samples, I'm trying to use it to describe structure. I hope that any forthcoming proposal is at least consistent with this fundamental interpretation.

jakebeal commented 6 years ago

I absolutely agree with you that a CD represents structure, and that we should be able to use it to describe real, physical structures. That is exactly why I want to not use "shallow" CDs as pointers to "real" CDs. Likewise for MDs.

I think we need to separate the "pointer" as a separate class, whatever the right name turns out to be, whether it be "Sample" or "Build" or "Aliquot" or "PhysicalThing" or whatever else might be the best fit for a representation of a physically instantiated design that somehow points to a CD or MD that describes it fully.

graik commented 6 years ago

We are talking about physical implementations (or experimental realizations) of a given design. This is not related to structure at all. Different implementations (for example different clones) need to be distinguishable because they may or may not be validated by experiments. They may all originate from the same experiment or they may be created with different methods in different labs but they all point to the same design (CD or MD).

Let me try from another angle: We need a new class "Implementation" for the same reason that we have "Component" (a.k.a. SubPart) instead of creating a new "ComponentDefinition" each time a part is re-used in a sequence design. Or again from another angle: we are crossing a boundary here from design to experiment. Using ComponentDefinition or ModuleDefinition to enumerate bacterial colonies, cell lines or enzyme batches in the lab is just a really bad idea.

Good night, Raik

On Sat, Jul 22, 2017 at 9:49 PM, Jacob Beal notifications@github.com wrote:

I absolutely agree with you that a CD represents structure, and that we should be able to use it to describe real, physical structures. That is exactly why I want to not use "shallow" CDs as pointers to "real" CDs. Likewise for MDs.

I think we need to separate the "pointer" as a separate class, whatever the right name turns out to be, whether it be "Sample" or "Build" or "Aliquot" or "PhysicalThing" or whatever else might be the best fit for a representation of a physically instantiated design that somehow points to a CD or MD that describes it fully.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SEPs/issues/31#issuecomment-317206311, or mute the thread https://github.com/notifications/unsubscribe-auth/ABxs3cQqZUzAqHb7GfhF1MrbPCVplRBVks5sQlIygaJpZM4OTE4a .

--


Raik Grünberg http://www.raiks.de/contact.html


cjmyers commented 6 years ago

Hi,

I’m going to take a shot at seeing if I can try to unify the proposals. How about?

Experiment Design : URI reference to CD or MD design Build: URI reference to CD or MD build (may be same or different than design) Tests: [0..*] links to Test objects

Test Protocol : URI reference to a protocol Data : [0..*] links to Data objects

Data Source : URI (perhaps a reference to an attachment object) Format : URI

Chris

On Jul 22, 2017, at 11:41 PM, Raik Grünberg notifications@github.com wrote:

We are talking about physical implementations (or experimental realizations) of a given design. This is not related to structure at all. Different implementations (for example different clones) need to be distinguishable because they may or may not be validated by experiments. They may all originate from the same experiment or they may be created with different methods in different labs but they all point to the same design (CD or MD).

Let me try from another angle: We need a new class "Implementation" for the same reason that we have "Component" (a.k.a. SubPart) instead of creating a new "ComponentDefinition" each time a part is re-used in a sequence design. Or again from another angle: we are crossing a boundary here from design to experiment. Using ComponentDefinition or ModuleDefinition to enumerate bacterial colonies, cell lines or enzyme batches in the lab is just a really bad idea.

Good night, Raik

On Sat, Jul 22, 2017 at 9:49 PM, Jacob Beal notifications@github.com wrote:

I absolutely agree with you that a CD represents structure, and that we should be able to use it to describe real, physical structures. That is exactly why I want to not use "shallow" CDs as pointers to "real" CDs. Likewise for MDs.

I think we need to separate the "pointer" as a separate class, whatever the right name turns out to be, whether it be "Sample" or "Build" or "Aliquot" or "PhysicalThing" or whatever else might be the best fit for a representation of a physically instantiated design that somehow points to a CD or MD that describes it fully.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SEPs/issues/31#issuecomment-317206311, or mute the thread https://github.com/notifications/unsubscribe-auth/ABxs3cQqZUzAqHb7GfhF1MrbPCVplRBVks5sQlIygaJpZM4OTE4a .

--


Raik Grünberg http://www.raiks.de/contact.html


— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SEPs/issues/31#issuecomment-317212224, or mute the thread https://github.com/notifications/unsubscribe-auth/ADWD94YmSaCd809ZuznZzKHuTaXnlJfqks5sQmyegaJpZM4OTE4a.

graik commented 6 years ago

Hi Chris,

I also think the proposals can be unified. However, your suggestion is still missing the (I think) most important point of the discussion: We need a class for "physical implementation" that is not CD or MD (but referring to it). This is because we need to be able to say: "These results apply to this experimental batch / clone / particular batch of cells". Without this concept, CD or MD are turned into representing experimental clones and batches (as it is done in this SEP) which is completely violating the scope of what CD and MD are supposed to represent and will confuse us for years to come (and trigger an avalanche of validation rules that are not needed if we keep this clearly separated). One model would be:

Implementation

ValidationExperiment

ComponentDefinition

Greetings Raik

On Sun, Jul 23, 2017 at 10:35 AM, cjmyers notifications@github.com wrote:

Hi,

I’m going to take a shot at seeing if I can try to unify the proposals. How about?

Experiment Design : URI reference to CD or MD design Build: URI reference to CD or MD build (may be same or different than design) Tests: [0..*] links to Test objects

Test Protocol : URI reference to a protocol Data : [0..*] links to Data objects

Data Source : URI (perhaps a reference to an attachment object) Format : URI

Chris

On Jul 22, 2017, at 11:41 PM, Raik Grünberg notifications@github.com wrote:

We are talking about physical implementations (or experimental realizations) of a given design. This is not related to structure at all. Different implementations (for example different clones) need to be distinguishable because they may or may not be validated by experiments. They may all originate from the same experiment or they may be created with different methods in different labs but they all point to the same design (CD or MD).

Let me try from another angle: We need a new class "Implementation" for the same reason that we have "Component" (a.k.a. SubPart) instead of creating a new "ComponentDefinition" each time a part is re-used in a sequence design. Or again from another angle: we are crossing a boundary here from design to experiment. Using ComponentDefinition or ModuleDefinition to enumerate bacterial colonies, cell lines or enzyme batches in the lab is just a really bad idea.

Good night, Raik

On Sat, Jul 22, 2017 at 9:49 PM, Jacob Beal notifications@github.com wrote:

I absolutely agree with you that a CD represents structure, and that we should be able to use it to describe real, physical structures. That is exactly why I want to not use "shallow" CDs as pointers to "real" CDs. Likewise for MDs.

I think we need to separate the "pointer" as a separate class, whatever the right name turns out to be, whether it be "Sample" or "Build" or "Aliquot" or "PhysicalThing" or whatever else might be the best fit for a representation of a physically instantiated design that somehow points to a CD or MD that describes it fully.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SEPs/issues/31#issuecomment-317206311, or mute the thread https://github.com/notifications/unsubscribe-auth/ ABxs3cQqZUzAqHb7GfhF1MrbPCVplRBVks5sQlIygaJpZM4OTE4a .

--


Raik Grünberg http://www.raiks.de/contact.html


— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/SynBioDex/SEPs/issues/31#issuecomment-317212224>, or mute the thread https://github.com/notifications/unsubscribe-auth/ ADWD94YmSaCd809ZuznZzKHuTaXnlJfqks5sQmyegaJpZM4OTE4a.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SEPs/issues/31#issuecomment-317237866, or mute the thread https://github.com/notifications/unsubscribe-auth/ABxs3Uxg6tEbLk7xvNI_j_2oeei8wvCNks5sQwXQgaJpZM4OTE4a .

--


Raik Grünberg http://www.raiks.de/contact.html


jakebeal commented 6 years ago

I had an insight --- I think the productionStatus field (possibly renamed) needs to be on the Implementation / Sample / Build, rather than on the ComponentDefinition / ModuleDefinition.

The reason we are making these "pointers" is to be able to make distinctions like "this is a real thing" vs. "this is an intention." In all of the proposals that have been made, we would be using a single CD/MD to provide the full-detail description of both a design and many actual samples --- the "no-content copies" are then allowing us to distinguish the physical/virtual nature of the different instances. So no matter what we do, we need to have a productionStatus associated with each sample, rather than with the full-detail CD/MD.

We can do this without having a "no-content copy" if we associate the field with the pointer to the design in the Implementation / Sample / Build, something like:

Implementation

graik commented 6 years ago

I think status flags may be useful on each level: (1) The Test or ValidationExperiment could use a flag telling us whether this particular test has turned out as expected (e.g. a single sequencing trace is what we expect). (2) The Implementation intstance certainly needs a flag telling us whether, judging from all validation runs (e.g. forward AND reverse sequencing), this particular clone is confirmed to be identical to the intended design (though we may later still find out that there is a problem with it). (3) And the MD or CD may also have a flag telling us whether there is supposed to be any correct implementation available, for example, or whether this DNA construct has been stuck at the design stage forever. So that would be Brian's productionStatus. This would be quite important to quickly filter through many designs or to mark "built" designs without revealing all the details about clones and samples.

Greetings Raik

On Sun, Jul 23, 2017 at 3:01 PM, Jacob Beal notifications@github.com wrote:

I had an insight --- I think the productionStatus field (possibly renamed) needs to be on the Implementation / Sample / Build, rather than on the ComponentDefinition / ModuleDefinition.

The reason we are making these "pointers" is to be able to make distinctions like "this is a real thing" vs. "this is an intention." In all of the proposals that have been made, we would be using a single CD/MD to provide the full-detail description of both a design and many actual samples --- the "no-content copies" are then allowing us to distinguish the physical/virtual nature of the different instances. So no matter what we do, we need to have a productionStatus associated with each sample, rather than with the full-detail CD/MD.

We can do this without having a "no-content copy" if we associate the field with the pointer to the design in the Implementation / Sample / Build, something like:

Implementation

  • design [1] -> ComponentDefinition / ModuleDefinition
  • designStatus [1] --> (#designIntent, #confirmedInSample)
  • test [0 .. *] -> Test

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SEPs/issues/31#issuecomment-317251515, or mute the thread https://github.com/notifications/unsubscribe-auth/ABxs3eHlqSrr8BsLXxK0auFpEjRcLxbZks5sQ0QlgaJpZM4OTE4a .

--


Raik Grünberg http://www.raiks.de/contact.html


bbartley commented 6 years ago

Some good suggestions from everybody here.

Cheers, Bryan

cjmyers commented 6 years ago

The iGEM registry / SynBioHub has several different status fields that map to these in some way. For example, I think igem#experience : Works maps very nicely to the Test stage. There are also #partStatus, #sampleStatus, and #status. It would be nice to know what the full range of values are for these fields.

The PartStatus field is not really useful, it only has two values “Deleted” and “Released HQ 2013”.

The SampleStatus field is a bit more useful, it has values, “Discontinued”, “For Reference Only”, “In Stock”, “It’s Complicated”, “No Part Sequence”, and “Not in Stock”.

The Status field has values, “Available”, “Deleted”, “Informational”, “Planning”, and “Unavailable”.

The Experience field has values, “Fails”, “Issues”, “None”, and “Works”.

The huge problem with all of these fields is they are not precisely defined, so they are really inconsistently used. Furthermore, they are not kept up-to-date. In any case, they require value you judgements that can be quite arbitrary. One thing Doug has really been pushing in the LCP project is to “never say something works”. This, in his opinion, is completely meaningless. It is better to report metrics, which I agree with. I think we should be very careful about baking into the standard non-quantitative statements about functionality.

Chris

cjmyers commented 6 years ago

I disagree. The “Experiment” class is the physical implementation class you are looking for. Maybe you just don’t like the name, which is fine. We can call it Implementation, but I think I like Experiment better, since an implementation is really about an experiment. Namely, you design something, you build it, and then you test it. To me, this collection of steps is conducting an experiment, and the Experiment class I propose links them all together. Your approach is missing a link to the physical realization. My proposal has design and build separate because the design may not be correctly realized. It might be you get a different sequence than intended or the realization perhaps has scars that are not part of the design, or some other artifact from construction.

So, I’m standing by my proposal. I think it is the cleanest approach so far. It avoids making changes to existing classes, making it easier to use right away. This feature also means that we avoid duplicate CDs/MDs as the original proposal was forcing us to do. It makes it really easy to determine if a design has been tested, simply look for Experiments referencing this design. Finally, it is not really all that different from what you have below except that I see the “Experiment” as the organizing class.

On Jul 23, 2017, at 10:36 AM, Raik Grünberg notifications@github.com wrote:

Hi Chris,

I also think the proposals can be unified. However, your suggestion is still missing the (I think) most important point of the discussion: We need a class for "physical implementation" that is not CD or MD (but referring to it). This is because we need to be able to say: "These results apply to this experimental batch / clone / particular batch of cells". Without this concept, CD or MD are turned into representing experimental clones and batches (as it is done in this SEP) which is completely violating the scope of what CD and MD are supposed to represent and will confuse us for years to come (and trigger an avalanche of validation rules that are not needed if we keep this clearly separated). One model would be:

Implementation

  • design -> ComponentDefintion
  • validation -> ValidationExperiment

ValidationExperiment

  • data
  • protocol
  • validation_result: "confirmed" / "failed" / "ambiguous" / "unknown"

ComponentDefinition

  • productionStatus: built
  • implementations -> ...
  • prov-o: derrived_from -> ProvO record pointing to original design if different

Greetings Raik

On Sun, Jul 23, 2017 at 10:35 AM, cjmyers notifications@github.com wrote:

Hi,

I’m going to take a shot at seeing if I can try to unify the proposals. How about?

Experiment Design : URI reference to CD or MD design Build: URI reference to CD or MD build (may be same or different than design) Tests: [0..*] links to Test objects

Test Protocol : URI reference to a protocol Data : [0..*] links to Data objects

Data Source : URI (perhaps a reference to an attachment object) Format : URI

Chris

On Jul 22, 2017, at 11:41 PM, Raik Grünberg notifications@github.com wrote:

We are talking about physical implementations (or experimental realizations) of a given design. This is not related to structure at all. Different implementations (for example different clones) need to be distinguishable because they may or may not be validated by experiments. They may all originate from the same experiment or they may be created with different methods in different labs but they all point to the same design (CD or MD).

Let me try from another angle: We need a new class "Implementation" for the same reason that we have "Component" (a.k.a. SubPart) instead of creating a new "ComponentDefinition" each time a part is re-used in a sequence design. Or again from another angle: we are crossing a boundary here from design to experiment. Using ComponentDefinition or ModuleDefinition to enumerate bacterial colonies, cell lines or enzyme batches in the lab is just a really bad idea.

Good night, Raik

On Sat, Jul 22, 2017 at 9:49 PM, Jacob Beal notifications@github.com wrote:

I absolutely agree with you that a CD represents structure, and that we should be able to use it to describe real, physical structures. That is exactly why I want to not use "shallow" CDs as pointers to "real" CDs. Likewise for MDs.

I think we need to separate the "pointer" as a separate class, whatever the right name turns out to be, whether it be "Sample" or "Build" or "Aliquot" or "PhysicalThing" or whatever else might be the best fit for a representation of a physically instantiated design that somehow points to a CD or MD that describes it fully.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SEPs/issues/31#issuecomment-317206311, or mute the thread https://github.com/notifications/unsubscribe-auth/ ABxs3cQqZUzAqHb7GfhF1MrbPCVplRBVks5sQlIygaJpZM4OTE4a .

--


Raik Grünberg http://www.raiks.de/contact.html


— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/SynBioDex/SEPs/issues/31#issuecomment-317212224>, or mute the thread https://github.com/notifications/unsubscribe-auth/ ADWD94YmSaCd809ZuznZzKHuTaXnlJfqks5sQmyegaJpZM4OTE4a.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SEPs/issues/31#issuecomment-317237866, or mute the thread https://github.com/notifications/unsubscribe-auth/ABxs3Uxg6tEbLk7xvNI_j_2oeei8wvCNks5sQwXQgaJpZM4OTE4a .

--


Raik Grünberg http://www.raiks.de/contact.html


— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SEPs/issues/31#issuecomment-317240895, or mute the thread https://github.com/notifications/unsubscribe-auth/ADWD940ZwHbyZ6uTlNl5MtHb4vHuf9Rlks5sQxQwgaJpZM4OTE4a.

graik commented 6 years ago

On Wed, Jul 26, 2017 at 11:45 AM, cjmyers notifications@github.com wrote:

I disagree. The “Experiment” class is the physical implementation class you are looking for. Maybe you just don’t like the name, which is fine. We can call it Implementation, but I think I like Experiment better, since an implementation is really about an experiment. Namely, you design something, you build it, and then you test it. To

OK, then this is where the confusion comes from. Physical implementations or realizations are particular clones, batches, cell lines etc which are NOT experiments. They are the result of experimental work -- for example, a single cloning experiment will generate lots of distinct clones or batches of DNA, some correct, some not. These batches each need to be validated by experiments and they can be used in later experiments or become the basis of further construction work. Some clones/cell lines/implementations may seem correct now but later experiments reveal that they have an issue.So we need to be able to trace them.

If we rename your "Experiment" class to "Implementation" and your / Brian's "Test" to "Experiment" the two drafts are almost identical.

me, this collection of steps is conducting an experiment, and the Experiment class I propose links them all together. Your approach is missing a link to the physical realization. My proposal has design and build separate because the design may not be correctly realized. It might be you get a different sequence than intended or the realization perhaps has scars that are not part of the design, or some other artifact from construction.

Modern experimental practice is not very tolerant of such unintended effects. E.g. with gene synthesis and lab-internal cloning you either get exactly what you want or the result goes to the bin. So in this particular context (arguably most important for SBOL), your result is either correctly built or not. If you really want to continue with an incorrect built, you better make a new version of CD and provO could take care of linking up to the old one.

However, commonly a particular clone or cell line may have acquired mutations or changes outside of your design area which you are either unaware of (yet) or which you don't care about. E.g. offsite-cutting in genome engineering or mutations on the plasmid backbone. Or your design only said "knock out this gene" and 10 different clones from your CRISPR experiment have the correct knockout but all of course look slightly different at the sequence level. If you do not want to use provO for this but want to have an explicit field for "here is a more detailed description of this particular clone", then I am fine with that. As more experiments are run on a particular clone, the "build" Component or Module will also become more detailed. So there is still room for provO versioning to be used.

So in summary, I agree with your outline and would mostly change names:

Implementation (e.g. a bacterial clone) design : URI reference to CD or MD design build: URI reference to CD or MD build (may be same or different than design) -- leave out if identical? tests: [0..*] links to ValidationExperiment objects build-status: URI (design / under_construction / built ) validation-status: URI (not_tested / correct / incorrect / ambiguous )

ValidationExperiment (e.g. a single sequencing run) Protocol : URI reference to a protocol Data : [0..*] links to Data objects evaluation: set of URI tags that depend on type of experiment (e.g. confirmed / corrupt / incomplete)

Data Source : URI (perhaps a reference to an attachment object) Format : URI

I agree that the iGEM tagging is not a very good example. However, for the more narrow scope of whether or not something has been correctly built (never mind whether it actually works as intended), we can define useful and universal tags. The trick is to keep the scope indeed limited to "construction as specified by design" and not to get dragged into "this works" or "this doesn't work".

I have no strong opinion about the Data class. I guess a generic container for attachments would be extremely useful also for other SBOL classes and this would be the same as the data object, would it not?

Greetings Raik

So, I’m standing by my proposal. I think it is the cleanest approach so far. It avoids making changes to existing classes, making it easier to use right away. This feature also means that we avoid duplicate CDs/MDs as the original proposal was forcing us to do. It makes it really easy to determine if a design has been tested, simply look for Experiments referencing this design. Finally, it is not really all that different from what you have below except that I see the “Experiment” as the organizing class.

On Jul 23, 2017, at 10:36 AM, Raik Grünberg notifications@github.com wrote:

Hi Chris,

I also think the proposals can be unified. However, your suggestion is still missing the (I think) most important point of the discussion: We need a class for "physical implementation" that is not CD or MD (but referring to it). This is because we need to be able to say: "These results apply to this experimental batch / clone / particular batch of cells". Without this concept, CD or MD are turned into representing experimental clones and batches (as it is done in this SEP) which is completely violating the scope of what CD and MD are supposed to represent and will confuse us for years to come (and trigger an avalanche of validation rules that are not needed if we keep this clearly separated). One model would be:

Implementation

  • design -> ComponentDefintion
  • validation -> ValidationExperiment

ValidationExperiment

  • data
  • protocol
  • validation_result: "confirmed" / "failed" / "ambiguous" / "unknown"

ComponentDefinition

  • productionStatus: built
  • implementations -> ...
  • prov-o: derrived_from -> ProvO record pointing to original design if different

Greetings Raik

On Sun, Jul 23, 2017 at 10:35 AM, cjmyers notifications@github.com wrote:

Hi,

I’m going to take a shot at seeing if I can try to unify the proposals. How about?

Experiment Design : URI reference to CD or MD design Build: URI reference to CD or MD build (may be same or different than design) Tests: [0..*] links to Test objects

Test Protocol : URI reference to a protocol Data : [0..*] links to Data objects

Data Source : URI (perhaps a reference to an attachment object) Format : URI

Chris

On Jul 22, 2017, at 11:41 PM, Raik Grünberg < notifications@github.com> wrote:

We are talking about physical implementations (or experimental realizations) of a given design. This is not related to structure at all. Different implementations (for example different clones) need to be distinguishable because they may or may not be validated by experiments. They may all originate from the same experiment or they may be created with different methods in different labs but they all point to the same design (CD or MD).

Let me try from another angle: We need a new class "Implementation" for the same reason that we have "Component" (a.k.a. SubPart) instead of creating a new "ComponentDefinition" each time a part is re-used in a sequence design. Or again from another angle: we are crossing a boundary here from design to experiment. Using ComponentDefinition or ModuleDefinition to enumerate bacterial colonies, cell lines or enzyme batches in the lab is just a really bad idea.

Good night, Raik

On Sat, Jul 22, 2017 at 9:49 PM, Jacob Beal < notifications@github.com> wrote:

I absolutely agree with you that a CD represents structure, and that we should be able to use it to describe real, physical structures. That is exactly why I want to not use "shallow" CDs as pointers to "real" CDs. Likewise for MDs.

I think we need to separate the "pointer" as a separate class, whatever the right name turns out to be, whether it be "Sample" or "Build" or "Aliquot" or "PhysicalThing" or whatever else might be the best fit for a representation of a physically instantiated design that somehow points to a CD or MD that describes it fully.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SEPs/issues/31#issuecomment- 317206311, or mute the thread https://github.com/notifications/unsubscribe-auth/ ABxs3cQqZUzAqHb7GfhF1MrbPCVplRBVks5sQlIygaJpZM4OTE4a .

--


Raik Grünberg http://www.raiks.de/contact.html


— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/SynBioDex/SEPs/issues/31#issuecomment-317212224>, or mute the thread https://github.com/notifications/unsubscribe-auth/ ADWD94YmSaCd809ZuznZzKHuTaXnlJfqks5sQmyegaJpZM4OTE4a.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SEPs/issues/31#issuecomment-317237866, or mute the thread https://github.com/notifications/unsubscribe- auth/ABxs3Uxg6tEbLk7xvNI_j_2oeei8wvCNks5sQwXQgaJpZM4OTE4a .

--


Raik Grünberg http://www.raiks.de/contact.html


— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/SynBioDex/SEPs/issues/31#issuecomment-317240895>, or mute the thread https://github.com/notifications/unsubscribe-auth/ ADWD940ZwHbyZ6uTlNl5MtHb4vHuf9Rlks5sQxQwgaJpZM4OTE4a.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SEPs/issues/31#issuecomment-318004876, or mute the thread https://github.com/notifications/unsubscribe-auth/ABxs3elXsu5naz_8dvJNZOzHJ1sx7-uWks5sRwrBgaJpZM4OTE4a .

--


Raik Grünberg http://www.raiks.de/contact.html


cjmyers commented 6 years ago

I think I've discovered the source of our disagreement here. You have a much narrower view of this class than I do. I see a validation experiment as only one possible type of test. My view is this class should include all experiment types not just is this what was intended but also what is its performance. I think if done right one general class can be used to tie together all elements of design build test and not just design build validate.

I don't think we are trying to restrict ourselves to just representing samples. The goal of this SEP is representing experiments.

Does this make sense?

Chris

Sent from my iPhone

On Jul 26, 2017, at 1:02 PM, Raik Grünberg notifications@github.com wrote:

On Wed, Jul 26, 2017 at 11:45 AM, cjmyers notifications@github.com wrote:

I disagree. The “Experiment” class is the physical implementation class you are looking for. Maybe you just don’t like the name, which is fine. We can call it Implementation, but I think I like Experiment better, since an implementation is really about an experiment. Namely, you design something, you build it, and then you test it. To

OK, then this is where the confusion comes from. Physical implementations or realizations are particular clones, batches, cell lines etc which are NOT experiments. They are the result of experimental work -- for example, a single cloning experiment will generate lots of distinct clones or batches of DNA, some correct, some not. These batches each need to be validated by experiments and they can be used in later experiments or become the basis of further construction work. Some clones/cell lines/implementations may seem correct now but later experiments reveal that they have an issue.So we need to be able to trace them.

If we rename your "Experiment" class to "Implementation" and your / Brian's "Test" to "Experiment" the two drafts are almost identical.

me, this collection of steps is conducting an experiment, and the Experiment class I propose links them all together. Your approach is missing a link to the physical realization. My proposal has design and build separate because the design may not be correctly realized. It might be you get a different sequence than intended or the realization perhaps has scars that are not part of the design, or some other artifact from construction.

Modern experimental practice is not very tolerant of such unintended effects. E.g. with gene synthesis and lab-internal cloning you either get exactly what you want or the result goes to the bin. So in this particular context (arguably most important for SBOL), your result is either correctly built or not. If you really want to continue with an incorrect built, you better make a new version of CD and provO could take care of linking up to the old one.

However, commonly a particular clone or cell line may have acquired mutations or changes outside of your design area which you are either unaware of (yet) or which you don't care about. E.g. offsite-cutting in genome engineering or mutations on the plasmid backbone. Or your design only said "knock out this gene" and 10 different clones from your CRISPR experiment have the correct knockout but all of course look slightly different at the sequence level. If you do not want to use provO for this but want to have an explicit field for "here is a more detailed description of this particular clone", then I am fine with that. As more experiments are run on a particular clone, the "build" Component or Module will also become more detailed. So there is still room for provO versioning to be used.

So in summary, I agree with your outline and would mostly change names:

Implementation (e.g. a bacterial clone) design : URI reference to CD or MD design build: URI reference to CD or MD build (may be same or different than design) -- leave out if identical? tests: [0..*] links to ValidationExperiment objects build-status: URI (design / under_construction / built ) validation-status: URI (not_tested / correct / incorrect / ambiguous )

ValidationExperiment (e.g. a single sequencing run) Protocol : URI reference to a protocol Data : [0..*] links to Data objects evaluation: set of URI tags that depend on type of experiment (e.g. confirmed / corrupt / incomplete)

Data Source : URI (perhaps a reference to an attachment object) Format : URI

I agree that the iGEM tagging is not a very good example. However, for the more narrow scope of whether or not something has been correctly built (never mind whether it actually works as intended), we can define useful and universal tags. The trick is to keep the scope indeed limited to "construction as specified by design" and not to get dragged into "this works" or "this doesn't work".

I have no strong opinion about the Data class. I guess a generic container for attachments would be extremely useful also for other SBOL classes and this would be the same as the data object, would it not?

Greetings Raik

So, I’m standing by my proposal. I think it is the cleanest approach so far. It avoids making changes to existing classes, making it easier to use right away. This feature also means that we avoid duplicate CDs/MDs as the original proposal was forcing us to do. It makes it really easy to determine if a design has been tested, simply look for Experiments referencing this design. Finally, it is not really all that different from what you have below except that I see the “Experiment” as the organizing class.

On Jul 23, 2017, at 10:36 AM, Raik Grünberg notifications@github.com wrote:

Hi Chris,

I also think the proposals can be unified. However, your suggestion is still missing the (I think) most important point of the discussion: We need a class for "physical implementation" that is not CD or MD (but referring to it). This is because we need to be able to say: "These results apply to this experimental batch / clone / particular batch of cells". Without this concept, CD or MD are turned into representing experimental clones and batches (as it is done in this SEP) which is completely violating the scope of what CD and MD are supposed to represent and will confuse us for years to come (and trigger an avalanche of validation rules that are not needed if we keep this clearly separated). One model would be:

Implementation

  • design -> ComponentDefintion
  • validation -> ValidationExperiment

ValidationExperiment

  • data
  • protocol
  • validation_result: "confirmed" / "failed" / "ambiguous" / "unknown"

ComponentDefinition

  • productionStatus: built
  • implementations -> ...
  • prov-o: derrived_from -> ProvO record pointing to original design if different

Greetings Raik

On Sun, Jul 23, 2017 at 10:35 AM, cjmyers notifications@github.com wrote:

Hi,

I’m going to take a shot at seeing if I can try to unify the proposals. How about?

Experiment Design : URI reference to CD or MD design Build: URI reference to CD or MD build (may be same or different than design) Tests: [0..*] links to Test objects

Test Protocol : URI reference to a protocol Data : [0..*] links to Data objects

Data Source : URI (perhaps a reference to an attachment object) Format : URI

Chris

On Jul 22, 2017, at 11:41 PM, Raik Grünberg < notifications@github.com> wrote:

We are talking about physical implementations (or experimental realizations) of a given design. This is not related to structure at all. Different implementations (for example different clones) need to be distinguishable because they may or may not be validated by experiments. They may all originate from the same experiment or they may be created with different methods in different labs but they all point to the same design (CD or MD).

Let me try from another angle: We need a new class "Implementation" for the same reason that we have "Component" (a.k.a. SubPart) instead of creating a new "ComponentDefinition" each time a part is re-used in a sequence design. Or again from another angle: we are crossing a boundary here from design to experiment. Using ComponentDefinition or ModuleDefinition to enumerate bacterial colonies, cell lines or enzyme batches in the lab is just a really bad idea.

Good night, Raik

On Sat, Jul 22, 2017 at 9:49 PM, Jacob Beal < notifications@github.com> wrote:

I absolutely agree with you that a CD represents structure, and that we should be able to use it to describe real, physical structures. That is exactly why I want to not use "shallow" CDs as pointers to "real" CDs. Likewise for MDs.

I think we need to separate the "pointer" as a separate class, whatever the right name turns out to be, whether it be "Sample" or "Build" or "Aliquot" or "PhysicalThing" or whatever else might be the best fit for a representation of a physically instantiated design that somehow points to a CD or MD that describes it fully.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SEPs/issues/31#issuecomment- 317206311, or mute the thread https://github.com/notifications/unsubscribe-auth/ ABxs3cQqZUzAqHb7GfhF1MrbPCVplRBVks5sQlIygaJpZM4OTE4a .

--


Raik Grünberg http://www.raiks.de/contact.html


— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/SynBioDex/SEPs/issues/31#issuecomment-317212224>, or mute the thread https://github.com/notifications/unsubscribe-auth/ ADWD94YmSaCd809ZuznZzKHuTaXnlJfqks5sQmyegaJpZM4OTE4a.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SEPs/issues/31#issuecomment-317237866, or mute the thread https://github.com/notifications/unsubscribe- auth/ABxs3Uxg6tEbLk7xvNI_j_2oeei8wvCNks5sQwXQgaJpZM4OTE4a .

--


Raik Grünberg http://www.raiks.de/contact.html


— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/SynBioDex/SEPs/issues/31#issuecomment-317240895>, or mute the thread https://github.com/notifications/unsubscribe-auth/ ADWD940ZwHbyZ6uTlNl5MtHb4vHuf9Rlks5sQxQwgaJpZM4OTE4a.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SEPs/issues/31#issuecomment-318004876, or mute the thread https://github.com/notifications/unsubscribe-auth/ABxs3elXsu5naz_8dvJNZOzHJ1sx7-uWks5sRwrBgaJpZM4OTE4a .

--


Raik Grünberg http://www.raiks.de/contact.html


— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

graik commented 6 years ago

On Wed, Jul 26, 2017 at 2:41 PM, cjmyers notifications@github.com wrote:

I think I've discovered the source of our disagreement here. You have a much narrower view of this class than I do. I see a validation experiment as only one possible type of test. My view is this class should include all experiment types not just is this what was intended but also what is its performance. I think if done right one general class can be used to tie together all elements of design build test and not just design build validate.

Yes, this may be possible right away. I was advocating a more narrow "Build Validation Experiment" because I think that's the immediate problem we face and it is very well defined. I am afraid that if we enter discussions about "performance evaluation", we will keep on discussing for three years (wouldn't be the first time). But I am happy to be proven wrong.

I don't think we are trying to restrict ourselves to just representing samples. The goal of this SEP is representing experiments.

Let's please eliminate the word "sample" from this discussion thread. We are NOT talking about samples here. We are talking about clones and cell lines and batches but not about the physical tubes that contain them. Since I am one the few people here who is actually doing experiments, let me make some definitions:

Batch = Implementation = Physical Realization: one particular realization of a design. Typical examples: a clonal population of cells, a batch of DNA extracted from one single clonal population, a batch of protein purified from one clonal culture, one clone of a manipulated cell line, one batch of cell-free extract. A batch is most often distributed over many samples.

Sample: one particular physical container, typically with a label, typically stored in one clearly defined location or shipped from A to B. Sample contains one or more types of molecules or cells from one or from different (but clearly defined) batches at defined concentrations, typically mixed with clearly defined solvents (aka water, buffers) or media. Samples accumulate a history of experimental manipulations.

Experiment = Test: one particular set of manipulations carried out in one lab. Methods are defined by a protocol. Starting point are one or more samples. Result are data and sometimes new samples.

Experiments are usually performed on one out of many samples of a batch. The results then presumably apply to the whole batch and all samples of it. So it is generally OK to attach experiments directly to a batch leaving out the sample information because this is what people most likely care about outside of your own lab and surroundings. Sample logistics and LIMS type of information is something we may want to discuss at some point but I propose we do not discuss it here and now. "Batch" is as far down as we need to go to represent the design - build - test cycle.

Greetings Raik

Does this make sense?

Chris

Sent from my iPhone

On Jul 26, 2017, at 1:02 PM, Raik Grünberg notifications@github.com wrote:

On Wed, Jul 26, 2017 at 11:45 AM, cjmyers notifications@github.com wrote:

I disagree. The “Experiment” class is the physical implementation class you are looking for. Maybe you just don’t like the name, which is fine. We can call it Implementation, but I think I like Experiment better, since an implementation is really about an experiment. Namely, you design something, you build it, and then you test it. To

OK, then this is where the confusion comes from. Physical implementations or realizations are particular clones, batches, cell lines etc which are NOT experiments. They are the result of experimental work -- for example, a single cloning experiment will generate lots of distinct clones or batches of DNA, some correct, some not. These batches each need to be validated by experiments and they can be used in later experiments or become the basis of further construction work. Some clones/cell lines/implementations may seem correct now but later experiments reveal that they have an issue.So we need to be able to trace them.

If we rename your "Experiment" class to "Implementation" and your / Brian's "Test" to "Experiment" the two drafts are almost identical.

me, this collection of steps is conducting an experiment, and the Experiment class I propose links them all together. Your approach is missing a link to the physical realization. My proposal has design and build separate because the design may not be correctly realized. It might be you get a different sequence than intended or the realization perhaps has scars that are not part of the design, or some other artifact from construction.

Modern experimental practice is not very tolerant of such unintended effects. E.g. with gene synthesis and lab-internal cloning you either get exactly what you want or the result goes to the bin. So in this particular context (arguably most important for SBOL), your result is either correctly built or not. If you really want to continue with an incorrect built, you better make a new version of CD and provO could take care of linking up to the old one.

However, commonly a particular clone or cell line may have acquired mutations or changes outside of your design area which you are either unaware of (yet) or which you don't care about. E.g. offsite-cutting in genome engineering or mutations on the plasmid backbone. Or your design only said "knock out this gene" and 10 different clones from your CRISPR experiment have the correct knockout but all of course look slightly different at the sequence level. If you do not want to use provO for this but want to have an explicit field for "here is a more detailed description of this particular clone", then I am fine with that. As more experiments are run on a particular clone, the "build" Component or Module will also become more detailed. So there is still room for provO versioning to be used.

So in summary, I agree with your outline and would mostly change names:

Implementation (e.g. a bacterial clone) design : URI reference to CD or MD design build: URI reference to CD or MD build (may be same or different than design) -- leave out if identical? tests: [0..*] links to ValidationExperiment objects build-status: URI (design / under_construction / built ) validation-status: URI (not_tested / correct / incorrect / ambiguous )

ValidationExperiment (e.g. a single sequencing run) Protocol : URI reference to a protocol Data : [0..*] links to Data objects evaluation: set of URI tags that depend on type of experiment (e.g. confirmed / corrupt / incomplete)

Data Source : URI (perhaps a reference to an attachment object) Format : URI

I agree that the iGEM tagging is not a very good example. However, for the more narrow scope of whether or not something has been correctly built (never mind whether it actually works as intended), we can define useful and universal tags. The trick is to keep the scope indeed limited to "construction as specified by design" and not to get dragged into "this works" or "this doesn't work".

I have no strong opinion about the Data class. I guess a generic container for attachments would be extremely useful also for other SBOL classes and this would be the same as the data object, would it not?

Greetings Raik

So, I’m standing by my proposal. I think it is the cleanest approach so far. It avoids making changes to existing classes, making it easier to use right away. This feature also means that we avoid duplicate CDs/MDs as the original proposal was forcing us to do. It makes it really easy to determine if a design has been tested, simply look for Experiments referencing this design. Finally, it is not really all that different from what you have below except that I see the “Experiment” as the organizing class.

On Jul 23, 2017, at 10:36 AM, Raik Grünberg < notifications@github.com> wrote:

Hi Chris,

I also think the proposals can be unified. However, your suggestion is still missing the (I think) most important point of the discussion: We need a class for "physical implementation" that is not CD or MD (but referring to it). This is because we need to be able to say: "These results apply to this experimental batch / clone / particular batch of cells". Without this concept, CD or MD are turned into representing experimental clones and batches (as it is done in this SEP) which is completely violating the scope of what CD and MD are supposed to represent and will confuse us for years to come (and trigger an avalanche of validation rules that are not needed if we keep this clearly separated). One model would be:

Implementation

  • design -> ComponentDefintion
  • validation -> ValidationExperiment

ValidationExperiment

  • data
  • protocol
  • validation_result: "confirmed" / "failed" / "ambiguous" / "unknown"

ComponentDefinition

  • productionStatus: built
  • implementations -> ...
  • prov-o: derrived_from -> ProvO record pointing to original design if different

Greetings Raik

On Sun, Jul 23, 2017 at 10:35 AM, cjmyers notifications@github.com wrote:

Hi,

I’m going to take a shot at seeing if I can try to unify the proposals. How about?

Experiment Design : URI reference to CD or MD design Build: URI reference to CD or MD build (may be same or different than design) Tests: [0..*] links to Test objects

Test Protocol : URI reference to a protocol Data : [0..*] links to Data objects

Data Source : URI (perhaps a reference to an attachment object) Format : URI

Chris

On Jul 22, 2017, at 11:41 PM, Raik Grünberg < notifications@github.com> wrote:

We are talking about physical implementations (or experimental realizations) of a given design. This is not related to structure at all. Different implementations (for example different clones) need to be distinguishable because they may or may not be validated by experiments. They may all originate from the same experiment or they may be created with different methods in different labs but they all point to the same design (CD or MD).

Let me try from another angle: We need a new class "Implementation" for the same reason that we have "Component" (a.k.a. SubPart) instead of creating a new "ComponentDefinition" each time a part is re-used in a sequence design. Or again from another angle: we are crossing a boundary here from design to experiment. Using ComponentDefinition or ModuleDefinition to enumerate bacterial colonies, cell lines or enzyme batches in the lab is just a really bad idea.

Good night, Raik

On Sat, Jul 22, 2017 at 9:49 PM, Jacob Beal < notifications@github.com> wrote:

I absolutely agree with you that a CD represents structure, and that we should be able to use it to describe real, physical structures. That is exactly why I want to not use "shallow" CDs as pointers to "real" CDs. Likewise for MDs.

I think we need to separate the "pointer" as a separate class, whatever the right name turns out to be, whether it be "Sample" or "Build" or "Aliquot" or "PhysicalThing" or whatever else might be the best fit for a representation of a physically instantiated design that somehow points to a CD or MD that describes it fully.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/SynBioDex/SEPs/issues/31#issuecomment-

317206311>,

or mute

the thread https://github.com/notifications/unsubscribe-auth/ ABxs3cQqZUzAqHb7GfhF1MrbPCVplRBVks5sQlIygaJpZM4OTE4a .

--


Raik Grünberg http://www.raiks.de/contact.html


— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/SynBioDex/SEPs/issues/31#issuecomment-317212224 , or mute the thread https://github.com/notifications/unsubscribe- auth/ ADWD94YmSaCd809ZuznZzKHuTaXnlJfqks5sQmyegaJpZM4OTE4a.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SEPs/issues/31#issuecomment- 317237866, or mute the thread https://github.com/notifications/unsubscribe- auth/ABxs3Uxg6tEbLk7xvNI_j_2oeei8wvCNks5sQwXQgaJpZM4OTE4a .

--


Raik Grünberg http://www.raiks.de/contact.html


— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/SynBioDex/SEPs/issues/31#issuecomment-317240895>, or mute the thread https://github.com/notifications/unsubscribe-auth/ ADWD940ZwHbyZ6uTlNl5MtHb4vHuf9Rlks5sQxQwgaJpZM4OTE4a.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SEPs/issues/31#issuecomment-318004876, or mute the thread https://github.com/notifications/unsubscribe-auth/ABxs3elXsu5naz_ 8dvJNZOzHJ1sx7-uWks5sRwrBgaJpZM4OTE4a .

--


Raik Grünberg http://www.raiks.de/contact.html


— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SEPs/issues/31#issuecomment-318041161, or mute the thread https://github.com/notifications/unsubscribe-auth/ABxs3Qwpd7-Obm0YV6g7lZWT4SvLJkWfks5sRzP5gaJpZM4OTE4a .

--


Raik Grünberg http://www.raiks.de/contact.html


palchicz commented 5 years ago

Closing in accordance with changes to SEP issue tracking rules detailed in SEP 001 https://github.com/SynBioDex/SEPs/commit/bcbbcab01a2b01d5055fc55d03c949fb227e37d2#diff-44cec2aabf4c066f9a54ac4ef6634b9b