SynBioDex / SBOL-specification

The Synthetic Biology Open Language (SBOL)
http://sbolstandard.org
13 stars 9 forks source link

Stand-alone representation of externally defined components #439

Open jakebeal opened 3 years ago

jakebeal commented 3 years ago

While the ExternallyDefined feature makes it very lightweight to define elements of a design that are taken from other databases (e.g., small molecules, proteins, vendor reagents, kits), I don't think that we've made a clear recommendation for a best practice with respect to how to refer to such in other portions of a DBTL workflow.

Consider an Implementation that is being used to represent an aliquot of glucose. In the example on page 47 of the COMBINE tutorial, the glucose Implementation is defined by reference to a Component for glucose. Somewhere in there, however, we definitely want to have a link to a canonical database like ChEBI or PubChem.

I see three potential ways to do this:

  1. Keep the glucose Component and give it two types, one of which is the general SBO "Small Molecule" and the other being https://pubchem.ncbi.nlm.nih.gov/compound/5793
    • Advantages: clearly legal and coherent within the SBOL framework.
    • Disadvantages: once again we have to make a proliferating collection of empty "wrapper" objects for external compounds, just like we were trying to avoid by using ExternallyDefined. Wrappers make equality testing difficult, because there are likely to be many wrappers in different collections. May be difficult to keep coherent with usages in ExternallyDefined. Needs to have the "dual type" best practice added to the specification.
  2. Point the wasDerivedFrom or built link of Implementation directly at https://pubchem.ncbi.nlm.nih.gov/compound/5793.
    • Advantages: similar to current use of wasDerivedFrom to indicate component sources, no need for a change in the data model. The specification can recommend that one SHOULD NOT re-represent external objects with a wrapper class.
    • Disadvantages: searching for usages of glucose needs to operate at the property level rather than the object level. Not clear if this is legal within the SBOL framework - of it the current use of wasDerivedFrom is either, since the SBOL specification restricts a wasDerivedFrom to pointing at SBOL objects. Can't tell if you should expect an SBOL object or a non-SBOL object at the other end of the link.
  3. Add a new property (e.g., external) to the Implementation that is either an alternative to built or is an indicator of how to interpret it.
    • Advantages: Same as option 2, but with a clean distinction for how to interpret property values.
    • Disadvantages: Added complexity for Implementation objects. Searching for usages of glucose needs to operate at the property level rather than the object level.

Of these three options, I think I most prefer adding an optional external property to the Implementation object. What do others think?

jakebeal commented 3 years ago

Correction to what I've written above: wasDerivedFrom is explicitly allowed to point to non-SBOL objects.

What we do not currently have is a way to tell whether we should expect a URI to resolve to an SBOL object or a non-SBOL record.

jakebeal commented 2 years ago

A fourth option, following SEP 054, is to treat glucose as an import from a dissociated import package. This allows it to be imported to SBOL as a fixed "translation" rather than a unique "wrapper", meaning that we no longer have a problem of proliferating copies.

Gonza10V commented 1 year ago

Do you really need to point to an implementation of Glucose as a Component? You could use ExternallyDefined glucose as a feature for a Glucose 80 percent solution Component and point that from the Implementation. Another way to aproach this is by having ExternallyDefined Components. At the moment ExternallyDefined is a feature, makes sense modify ExternallyDefined to also be a Component or to create a new class for ExternallyDefinedComponent?

jakebeal commented 1 year ago

@Gonza10V In our current validation rules, sbol3-12301 says "Each prov:wasDerivedFrom property of an Implementation MUST refer to a Component", so we cannot point it at a Feature (including ExternallyDefined)---and there is no guarantee that such a Feature will even exist. Your other proposal is similar to option 1, which I am find less desirable than the other alternatives.