dasmoth / dalliance

Interactive web-based genome browser.
http://www.biodalliance.org/
BSD 2-Clause "Simplified" License
226 stars 68 forks source link

Support for large genomes in .2bit #226

Closed realtkd closed 6 years ago

realtkd commented 6 years ago

I have a large (plant) genome which exceeds the 4Gb limit of the 2bit index. When building the 2bit sequence with the -long option, the resulting .2bit file is no longer supported by dalliance with error message Unsupported version 1 on the console. Is there an easy way to fix/work around this?

dasmoth commented 6 years ago

Thanks for bringing this to my attention. The new "version 1" 2bit files with 64 bit offsets are a relatively new addition (indeed https://genome.ucsc.edu/FAQ/FAQformat.html#format7 hasn't been updated yet, but fortunately the linked source code makes it fairly clear what's going on). I'll add support to Biodalliance this weekend.

Do you happen to know a good example of a >4Gb genome I can use in testing? Is wheat my best option? (although doesn't look like it'll quite exceed 4Gb in .2bit format...)

realtkd commented 6 years ago

Thank you for the fast response. I guess, wheat is a good choice, in fact this is the genome I am working on. The actual size is approx 17Gb which well exceeds the 4Gb limit of the previous 2bit version (it is my understanding that the 4Gb limit refers to the sequence length, not the file size). I'll be happy to test the new version with support for large genomes.

N.B. I tried to package the sequence in the old 2bit with the effect that chromosomes after 5B were not 'reachable' in dalliance, which I assume is the consequence of the 2bit index being too small to address the whole genome.

dasmoth commented 6 years ago

Should be fixed now. Thanks once again for bringing the new format to my attention.

realtkd commented 6 years ago

I built from master and works like a charm now and I can access the complete genome. Thanks a lot for the quick fix. Two minor things I noted: it seems there is a difference between 0.13.8 and the master in respect to how addViewListener builds the range (with 0.13.8 (the released file available from www.biodalliance.org/release-0.13/...) I see e.g. chr1A:1..1345 on server side, master I see 1A:1..1345, so the 'chr' is missing). However, I could not pinpoint that in the diffs, so might be something else. I fixed that on the server side for now. The second thing: master needs a version bump, it's still at 0.13.7.