SynBioDex / libSBOLj

Java Library for Synthetic Biology Open Language (SBOL)
Apache License 2.0
38 stars 24 forks source link

Add Support for REGION Tag to GenBank-to-SBOL Conversion #554

Closed nroehner closed 5 years ago

nroehner commented 6 years ago

On the NCBI website, it is possible to change the region shown for a GenBank entry. When exporting the selected region to a GenBank file, the ACCESSION tag of the GenBank file is extended with a REGION tag that indicates what portion of the source GenBank entry is included in the GenBank file.

This feature request is for GenBank-to-SBOL conversion to support encoding the REGION tag of a GenBank file as a genbank property in the resulting SBOL file. For example,

REGION: 1379..2083

would be encoded as

\<genbank:region>1379..2083</genbank:region>

and

REGION: complement(1379..2083)

would be encoded as

\<genbank:region>complement(1379..2083)</genbank:region>

nroehner commented 6 years ago

Here are examples of an input GenBank file feature the REGION tag and the target SBOL output file:

https://drive.google.com/open?id=1UFQFp06-U92d3rJfrLtJcOav_8o_l53K

cjmyers commented 6 years ago

I cannot find anywhere online any evidence that REGION in the ACCESSION line is a common thing to find there. Indeed, the only example I have is the one you sent me. Where did you get this? In fact, this does not seem to even be legal. ApE refuses to open this GenBank file, and SnapGene drops the REGION information when I read and re-save in either of its GenBank output formats. I’m extremely hesitant to add support for a non-standard GenBank file feature, since this will mean that we also would create an illegal GenBank file on round-trip.

cjmyers commented 6 years ago

@nroehner please close this issue if you agree this is fixed.