GMOD / jbrowse-components

Source code for JBrowse 2, a modern React-based genome browser
https://jbrowse.org/jb2
Apache License 2.0
207 stars 62 forks source link

Add ability to open a plain text GFF #2113

Closed cmdcolin closed 3 years ago

cmdcolin commented 3 years ago

I think this will become more important once we have for example JBrowse desktop. Users should be able to open a plain text gff without having to open e.g. tabix, genome tools to sort it, etc (those are i think difficult barriers to entry)

We could either pre-process the file to a more efficient format and write to disk (maybe a desktop only thing, but may also be useful for users of CLI...they would also be relieved of burden of pre-processing their GFFs) or, we do the simple thing and load the whole thing into memory

cmdcolin commented 3 years ago

Note that the human NCBI refseq GFF is pretty large coming in uncompressed at 537MB

https://s3.amazonaws.com/jbrowse.org/genomes/hg19/ncbi_refseq/GRCh37_latest_genomic.sort.gff.gz

It is quite detailed though and many GFFs might not share the same large size but it does suggest that loading it into memory could have a substantial cost

cmdcolin commented 3 years ago

Note also that this has historically I think been a stumbling block for jbrowse 1 in some cases (people try to open a plain text gff, browser freezes, no fun, and then they post a thread like this where they look for a different tool https://www.biostars.org/p/490779/#491309)

rbuels commented 3 years ago

could parse everything and just keep in memory, or could just hold the file text in memory and scan it and parse chunks on the fly to satisfy region queries

rbuels commented 3 years ago

or could get really fancy and just make some kind of tabix-like index when opened and hold that index in memory