GMOD / jbrowse

JBrowse 1, a full-featured genome browser built with JavaScript and HTML5. For JBrowse 2, see https://github.com/GMOD/jbrowse-components.
http://jbrowse.org
Other
464 stars 199 forks source link

BAM store needs support for .csi indexes #926

Closed keiranmraine closed 6 years ago

keiranmraine commented 7 years ago

I'm pretty sure that it doesn't but it's worth being aware that *.csi indexes will replace *.bai eventually, even for*.bam files.

https://github.com/samtools/hts-specs/issues/240#issuecomment-328872226

I'm not aware of any progress on migration htslib based parsing of bam/cram.

sagnikbanerjee15 commented 6 years ago

Hello,

I am working with Barley which have very large chromosomes. Could you please suggest a way in which I could visualize the alignments in JBrowse and still bypass the issue with indices.

Thank you.

nathandunn commented 6 years ago

FYI: https://samtools.github.io/hts-specs/SAMv1.pdf

5.3 C source code for computing bin number and overlapping bins
The following functions compute bin numbers and overlaps for a BAI-style binning scheme with 6 levels and
a minimum bin size of 214 bp. See the CSI specification for generalisations of these functions designed for
binning schemes with arbitrary depth and sizes.
/* calculate bin given an alignment covering [beg,end) (zero-based, half-closed-half-open) */
int reg2bin(int beg, int end)
{
--end;
if (beg>>14 == end>>14) return ((1<<15)-1)/7 + (beg>>14);
if (beg>>17 == end>>17) return ((1<<12)-1)/7 + (beg>>17);
if (beg>>20 == end>>20) return ((1<<9)-1)/7 + (beg>>20);
if (beg>>23 == end>>23) return ((1<<6)-1)/7 + (beg>>23);
if (beg>>26 == end>>26) return ((1<<3)-1)/7 + (beg>>26);
return 0;
}
/* calculate the list of bins that may overlap with region [beg,end) (zero-based) */
#define MAX_BIN (((1<<18)-1)/7)
int reg2bins(int beg, int end, uint16_t list[MAX_BIN])
{
int i = 0, k;
--end;
list[i++] = 0;
for (k = 1 + (beg>>26); k <= 1 + (end>>26); ++k) list[i++] = k;
for (k = 9 + (beg>>23); k <= 9 + (end>>23); ++k) list[i++] = k;
for (k = 73 + (beg>>20); k <= 73 + (end>>20); ++k) list[i++] = k;
for (k = 585 + (beg>>17); k <= 585 + (end>>17); ++k) list[i++] = k;
for (k = 4681 + (beg>>14); k <= 4681 + (end>>14); ++k) list[i++] = k;
return i;
}
keiranmraine commented 6 years ago

FYI, csi also applies to files that have traditionally used tabix indexing *.tbi:

$ tabix -h
...
Indexing Options:
   ...
   -C, --csi                  generate CSI index for VCF (default is TBI)
FredericBGA commented 6 years ago

Hello

Large VCF files need also to be indexed using CSI index, so JBrowse cannot handle them right now.

cmdcolin commented 6 years ago

Began some basic csi (for vcf currently) parsing here https://github.com/GMOD/jbrowse/tree/csi_index

cmdcolin commented 6 years ago

Woo! tested and it displays data in super big coordinates that tabix tbi can't index (when chromosome over a gigabase in length)

screenshot-localhost-2018 06 25-18-48-54

cmdcolin commented 6 years ago

Got CSI working for BAM now also :) woo

nathanhaigh commented 6 years ago

Oh man...I almost wet myself with excitement! I want to test this out ASAP with wheat! :)

1 happy man at this prospect!

keiranmraine commented 6 years ago

... do I dare say that they are currently discussing/adding *.sbi indexing:

http://github.com/samtools/hts-specs/pull/321

(will help solve the "guessing" about chunks)

cmdcolin commented 6 years ago

Oh wow haha. Is that an official solution to "bam index index"?