SynBioDex / sboljs3

A library for the Synthetic Biology Open Language (SBOL) written in TypeScript, for JavaScript/TypeScript applications in the browser or node.js
4 stars 0 forks source link

SBOL 3->2 needs to remap sequence encodings and component types #16

Open jakebeal opened 2 years ago

jakebeal commented 2 years ago

My current workaround in python:

    # remap sequence encodings:
    encoding_remapping = {
        sbol3.IUPAC_DNA_ENCODING: sbol2.SBOL_ENCODING_IUPAC,
        sbol3.IUPAC_PROTEIN_ENCODING: sbol2.SBOL_ENCODING_IUPAC_PROTEIN,
        sbol3.SMILES_ENCODING: sbol3.SMILES_ENCODING
    }
    for s in (o for o in doc3.objects if isinstance(o, sbol3.Sequence)):
        if s.encoding in encoding_remapping:
            s.encoding = encoding_remapping[s.encoding]
    # remap component types:
    type_remapping = {
        sbol3.SBO_DNA: sbol2.BIOPAX_DNA,
        sbol3.SBO_RNA: sbol2.BIOPAX_RNA,
        sbol3.SBO_PROTEIN: sbol2.BIOPAX_PROTEIN,
        sbol3.SBO_SIMPLE_CHEMICAL: sbol2.BIOPAX_SMALL_MOLECULE,
        sbol3.SBO_NON_COVALENT_COMPLEX: sbol2.BIOPAX_COMPLEX
    }
    for c in (o for o in doc3.objects if isinstance(o, sbol3.Component)):
        c.types = [(type_remapping[t] if t in type_remapping else t) for t in c.types]
jakebeal commented 2 years ago

orientations also appear to be failing in the same manner.

isaacguerreiros commented 2 years ago

take a look at this issue today, and looks like we should transfer this remapping you made to sbolgraph.

the workaround @jakebeal have made could be found here: https://github.com/iGEM-Engineering/iGEM-distribution/blob/a697bfcb9da4db38da07e19b379968013f284a35/scripts/scriptutils/conversions.py#L121

the encoding constants could be found here and here. should we move these constants to sbolgraph or most of them will be unnecessary? bioterms doesn't have the encoding constants either

so, last but not least, I clone sbolgraph repo, run npm install and npm test, and apparently shows up the project doesn't have any test. after some minutes I clone the SBOLTestSuite, inside the sbolgraph, and tried to run again npm test but the only output i received was this:

sbolgraph@0.45.0 test                                                                                                                                                              
bash test.sh
🔄 Converting file: SBOLTestSuite/GenBank/EF587312.gb  

...and that's it.

i think will be interesting to be able to run and create some tests for this issue and also know if it's necessary to move the constants to sbolgraph.

jakebeal commented 2 years ago

@isaacguerreiros I believe that the conversion tests in sbol-utilities will be good set of test cases to use here. The same conversions should be true, given that this issue is essentially asking for the corrective RDF changes in that library to be brought upstream into this library.

With regards to the constants --- anything that appears in the SBOL specification is, I think, fine to encode in the library. If you disagree, @udp , please comment.

@isaacguerreiros : do you need any other information in order to proceed?

isaacguerreir commented 2 years ago

I analyzed some of the code, and apparently SBOL Specification constants from pySBOL3 and bioterms are different. Bioterms have the same URI for encoding in SBOL2 and SBOL3 (see permalinks for the exact lines) while pySBOL3 have different identifiers from identifiers.org.

For me, looks like if we make bioterms specifiers and pySBOL3 specifiers for encoding equal it will be not necessary anymore to remap sequence encodings.

My understanding is: because the bioterms specification of SBOL3 and SBOL2 for encoding is the same as the pySBOL2, looks like it's important to convert making this remapping. But maybe if bioterms and pySBOL3 agree with the specification for encoding this remapping step will be unnecessary.

My pull request in bioterms is my attempt to resolve this.

Also, will be interesting to start discussing how I could test this change #19

isaacguerreir commented 2 years ago

Last, but not least: I could not find the smiles encoding at bioterms or sbolgraph. Is this a concern? At least, by looking the remapping you made @jakebeal, this could be a problem.

jakebeal commented 2 years ago

@isaacguerreir The pySBOL3 constants follow the SBOL 3.0.1 specification. If I'm understanding the constants file here correctly, it looks like the terms you identify just didn't get updated to their new values yet.

Also agree that it looks like the smiles term just isn't there; I don't see it anywhere in the library with a search.

isaacguerreir commented 2 years ago

Perfect. So the bioterms pull request could resolve the first part of the problem.

isaacguerreir commented 2 years ago

Take a look and the same problem occurs at SBOL3 Specification for Types. Added similar changes in the PR to correct the problem with type remapping.