Open keiranmraine opened 8 years ago
Technically what you propose is like a BED file with a bunch of info encoded in the 4th column. Since BED and BEDTabix is now mainline, there's nothing blocking using a format like this, but it just needs some code to convert into this format and interpret it.
Any implementation of this would need to be careful that the "old" format made by flatfile-to-json.pl still worked in the browser.
@billzt, @cmdcolin, relates to #780.
Currently when a gff3 file is converted to a gene/transcript track with
flatfile-to-json.pl
a folder and a minimum of 2 data files are generated per chromosome. For human GRCh37 gene/transcript track with decoy and scaffolds that comes to 984lf-*.jsonz
and 99hits-*.jsonz
.Have you thought about using tabix in a more novel way?
We use tabix to make pre-generated data structures easily accessible, specifically for gene data (everything after the first 3 columns is custom, but column 5 contains a perl data structure for the transcript):
You could build a standard JSON structure for each gene but write it to file as
1 line per gene, and then bgzip and index with tabix:
This would replacte the 1000+ files with 4 for the whole genome.
lf.json.gz.tbi
andhist.json.gz.tbi
Even if one file is maintained per chromosome this would still reduce down to 184 (46chr*4)