brinkmanlab / IslandCompare

Pipeline for detecting and annotating genomic islands and relationships between the respective genomes
Other
4 stars 2 forks source link

Result download options #171

Open innovate-invent opened 5 years ago

innovate-invent commented 5 years ago

Doing the conversion client side would be a good option. There are javascript projects for parsing the different file types.

It might also be good to provide a tabular view of the data. Tabulator is a popular javascript library that supports manipulating, filtering and reordering rows and columns of a table. It also allows downloading the data after these manipulations, allowing users to download the data in nearly any configuration they want. This would also resolve #121

There is an issue with providing output in Genbank format. Genbank only supports a limited number of feature types, genomic islands not being one of them.

innovate-invent commented 4 years ago

There are some barriers that will need to be overcome before Genbank format (or EMBL) can be supported for output.

Genbank supports a limited number of feature types, genomic islands not being one of them. I emailed the NIH requesting advice on how to store unsupported features and they recommended using the misc_feature feature type. ex:

misc_feature    654..26955
                         /note="AbGRI1-5 genomic island"

This is not ideal as it places structured data in a free form text field.

If genomic islands can accurately be referred to as mobile elements then another feature was recommended:

mobile_element  3190..57412
                           /note="Integrative Element (IE)"
                           /mobile_element_type="other:Acinetobacter Genomic
                           Island 1 (AGI1)"

but this does not contain structured data identifying it as a genomic island.

My alternative proposal is:

mobile_element  3190..57412
                           /note="Integrative Element (IE)"
                           /mobile_element_type="other:genomic_island"
                           /standard_name="Acinetobacter Genomic
                           Island 1 (AGI1)"

This conforms to the Genbank standard here: http://www.insdc.org/files/feature_table.html The mobile_element_type feature qualifier is defined as semi-structured data and genomic_island is the appropriate term from the Sequence Ontology. The Sequence Ontology also defines genomic_island as a descendant of mobile_genetic_element.

The other major barrier is that the stitcher currently generates invalid Genbank files. See https://github.com/brinkmanlab/galaxy-tools/issues/5 and https://github.com/brinkmanlab/galaxy-tools/issues/6 . The first linked issue could be resolved with https://github.com/brinkmanlab/galaxy-tools/issues/8 but I doubt the second issue would be. See also #144