Open Yating-L opened 6 years ago
Implementation sketch:
Primary advantages over current Hash implementation are that there are only 2 files for the index, instead of a big file tree.
trix store is implemented by dalliance browser so that is an interesting point of reference.
In their sample browser, you have things like this for the data in the trix index associating a name with other feature identifiers
eif4a1 ENST00000293831.8,1 ENST00000380512.5,1 ENST00000396527.3,1 ENST00000577269.1,1 ENST00000577731.1,1 ENST00000577738.1,1 ENST00000577929.1,1 ENST00000578324.1,1 ENST00000578476.1,1 ENST00000578495.1,1 ENST00000578569.1,1 ENST00000578754.1,1 ENST00000579085.1,1 ENST00000579139.1,1 ENST00000580461.1,1 ENST00000580886.1,1 ENST00000580888.1,1 ENST00000581384.1,1 ENST00000581544.1,1 ENST00000581770.1,1 ENST00000581808.1,1 ENST00000581841.1,1 ENST00000582050.1,1 ENST00000582169.1,1 ENST00000582213.1,1 ENST00000582746.1,1 ENST00000582848.1,1 ENST00000583217.1,1 ENST00000583389.1,1 ENST00000583802.1,1 ENST00000583899.1,1 ENST00000584054.1,1 ENST00000584712.1,1 ENST00000584784.1,1 ENST00000584798.1,1 ENST00000584860.1,1 ENST00000584901.1,1 ENST00000585024.1,1
eif4a1p1 ENST00000420241.1,1
eif4a1p10 ENST00000428832.2,1
eif4a1p11 ENST00000451239.1,1
eif4a1p12 ENST00000551910.1,1
eif4a1p13 ENST00000415667.1,1
eif4a1p2 ENST00000422633.1,1 ENST00000545933.1,1
eif4a1p3 ENST00000411521.1,1
eif4a1p5 ENST00000428062.1,1
eif4a1p6 ENST00000448133.1,1
eif4a1p7 ENST00000421800.1,1
That's in the .ix file. Then, obtaining this "match" in the trix index, it actually goes back to the bigbed file for what they call the "extra index" (see BBIExtraIndex.prototype.lookup in their codebase) for the feature location
The UCSC documents talk about extra indexes here too and allow extra indexes on arbitrary fields
https://genome.ucsc.edu/goldenpath/help/bigBed.html
I guess I just wanted to show that because it sort of is a question of whether we want to "wrap trix" like we talked before to have location data in the trix file
Also see this thread about the concept of partial match searches https://groups.google.com/a/soe.ucsc.edu/forum/#!topic/genome-mirror/loZy2Ps7sDU
It would be nice to support Trix index (ixx) for the name index. Current generate-names.pl will create a lot of files which may cause problems in JBrowse transferring, downloading and storage.
UCSC uses Trix index for fast look-up free text: https://genome.ucsc.edu/goldenpath/help/trix.html Utility: ixIxx - Create indices for simple line-oriented file of format, can download from http://hgdownload.soe.ucsc.edu/admin/exe/