SynBioDex / SBOL-specification

The Synthetic Biology Open Language (SBOL)
http://sbolstandard.org
14 stars 9 forks source link

Consolidate recommended sequence encodings? #10

Closed jakebeal closed 4 years ago

jakebeal commented 9 years ago

Somebody needs to revisit the question of which sequence encodings are recommended for the next version of the spec. We looked at EDAM on NCBO as a nice example that integrates many ontologies in biology. This could be a possible way to simplify to one ontology source rather than many as a best practice.

[imported from LaTeX]

goksel commented 9 years ago

Here are the terms from EDAM: SMILES: http://identifiers.org/edam/format_1196 InChI:http://identifiers.org/edam/format_1197 inchikey:http://identifiers.org/edam/format_1199 dna:http://identifiers.org/edam/format_1212 (Add a validation rule to indicate that the sequence must be valid according to the IUPAC DNA encoding at http://www.chem.qmul.ac.uk/iubmb/misc/naseq.html) rna:http://identifiers.org/edam/format_1213 aminoacid:http://identifiers.org/edam/format_1219 (Add a validation rule to indicate that the sequence must be valid according to the IUPAC AminoAcid encoding at http://www.chem.qmul.ac.uk/iupac/AminoAcid/)

jakebeal commented 7 years ago

Note: the commit referenced the wrong issue number.

jakebeal commented 7 years ago

Paraphrase of email conversation:

We should first agree on whether we want this backward compatible or not. Update the table with the new URIs won't be backward compatible.

CURRENT STATE OF THE SPEC: The encoding property can only have only one URI value.

Table1 sequence encoding types: Type URI
DNA, RNA http://www.chem.qmul.ac.uk/iubmb/misc/naseq.html Protein http://www.chem.qmul.ac.uk/iupac/AminoAcid/ SMILES http://www.opensmiles.org/opensmiles.html SmallMolecule

Solution 1 - NOT BACKWARD COMPATIBLE: Update Table 1 with new URIs

Solution 2- BACKWARD COMPATIBLE: Extend Table 1 with new entries only (such as InChi), keeping the existing values from Table 1 above.

Solution 3- BACKWARD COMPATIBLE: Update the encoding property to have a set of URIs rather than a single URI to have both old and new values. Update Table 1 with the new URIs below.

The non-backward-compatible solution would make this a 3.0 change. As for which of the backward compatible solutions is best... I don't really like either of the solutions, and there's also an issue that Sid pointed out on the Editor's call: switching to the identifiers.org URIs has the disadvantage that it makes the format totally inscrutable from URI (e.g., why should I associate "1212" with "DNA"?).

palchicz commented 5 years ago

This issue is being forced by https://github.com/SynBioDex/SBOL-specification/issues/191, where the URIs no longer dereference correctly. Although the dereferencing is an issue, changing the libraries to use the new URIs will introduce many more breaking changes. Therefore, we decided that this should be a 3.0 change (option 1).

jakebeal commented 4 years ago

Is this one now complete?

cjmyers commented 4 years ago

I think we did update encodings, so can close.