SynBioDex / libSBOLj

Java Library for Synthetic Biology Open Language (SBOL)
Apache License 2.0
37 stars 24 forks source link

Mystery conversion failure #622

Closed jakebeal closed 2 years ago

jakebeal commented 2 years ago

Attempting to convert this file: error.xml.txt to GenBank, causes libSBOLj's conversion routine to crash opaquely, even though it believes the file is valid.

Error report:

Validation successful, no errors. Exception in thread "main" java.lang.NullPointerException at org.sbolstandard.core2.GenBank.writeReferences(GenBank.java:514) at org.sbolstandard.core2.GenBank.writeComponentDefinition(GenBank.java:106) at org.sbolstandard.core2.GenBank.write(GenBank.java:124) at org.sbolstandard.core2.GenBank.write(GenBank.java:138) at org.sbolstandard.core2.SBOLWriter.write(SBOLWriter.java:230) at org.sbolstandard.core2.SBOLWriter.write(SBOLWriter.java:149) at org.sbolstandard.core2.SBOLWriter.write(SBOLWriter.java:210) at org.sbolstandard.core2.SBOLValidate.validate(SBOLValidate.java:2767) at org.sbolstandard.core2.SBOLValidate.main(SBOLValidate.java:3028)

jakebeal commented 2 years ago

Looks like the issue is triggered by annotations on a ComponentDefinition from the NCBI namespace (http://www.ncbi.nlm.nih.gov/genbank). A workaround for this bug is to simply strip all annotations from ComponentDefinitions

This is the kludge that I am using in Python:

    keepers = {'http://sbols.org/v2', 'http://www.w3.org/ns/prov', 'http://purl.org/dc/terms/',
               'http://sboltools.org/backport'}
    for c in doc2.componentDefinitions: # wipe out all annotation properties
        c.properties = {p:v for p,v in c.properties.items() if any(k for k in keepers if p.startswith(k))}
cjmyers commented 2 years ago

The issue with this is that it appears these SBOL records were converted from GenBank in the first place. The references in the GenBank file are not of the standard form, but rather just point to a URL, for example:

The converter was not expecting to see this, since references in GenBank look like this:

REFERENCE 1 (bases 1 to 5028) AUTHORS Torpey,L.E., Gibbs,P.E., Nelson,J. and Lawrence,C.W. TITLE Cloning and sequence of REV7, a gene whose function is required for DNA damage-induced mutagenesis in Saccharomyces cerevisiae JOURNAL Yeast 10 (11), 1503-1509 (1994) PUBMED 7871890

https://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html

Unfortunately, this is another example of the non-standard nature of GenBank.

In any case, a simple fix is to skip over these in the conversion back to GenBank.

cjmyers commented 2 years ago

I believe this is now fixed.