Open innovate-invent opened 5 years ago
There are some barriers that will need to be overcome before Genbank format (or EMBL) can be supported for output.
Genbank supports a limited number of feature types, genomic islands not being one of them.
I emailed the NIH requesting advice on how to store unsupported features and they recommended using the misc_feature
feature type. ex:
misc_feature 654..26955
/note="AbGRI1-5 genomic island"
This is not ideal as it places structured data in a free form text field.
If genomic islands can accurately be referred to as mobile elements then another feature was recommended:
mobile_element 3190..57412
/note="Integrative Element (IE)"
/mobile_element_type="other:Acinetobacter Genomic
Island 1 (AGI1)"
but this does not contain structured data identifying it as a genomic island.
My alternative proposal is:
mobile_element 3190..57412
/note="Integrative Element (IE)"
/mobile_element_type="other:genomic_island"
/standard_name="Acinetobacter Genomic
Island 1 (AGI1)"
This conforms to the Genbank standard here: http://www.insdc.org/files/feature_table.html
The mobile_element_type
feature qualifier is defined as semi-structured data and genomic_island
is the appropriate term from the Sequence Ontology. The Sequence Ontology also defines genomic_island
as a descendant of mobile_genetic_element
.
The other major barrier is that the stitcher currently generates invalid Genbank files. See https://github.com/brinkmanlab/galaxy-tools/issues/5 and https://github.com/brinkmanlab/galaxy-tools/issues/6 . The first linked issue could be resolved with https://github.com/brinkmanlab/galaxy-tools/issues/8 but I doubt the second issue would be. See also #144
Doing the conversion client side would be a good option. There are javascript projects for parsing the different file types.
It might also be good to provide a tabular view of the data. Tabulator is a popular javascript library that supports manipulating, filtering and reordering rows and columns of a table. It also allows downloading the data after these manipulations, allowing users to download the data in nearly any configuration they want. This would also resolve #121
There is an issue with providing output in Genbank format. Genbank only supports a limited number of feature types, genomic islands not being one of them.