Add "indexChunks" option to BAM sources

danvk commented 9 years ago

This can dramatically speed up the initial load of Dalliance for large BAM tracks with correspondingly large BAI files.

For example, I have a 9MB BAI file. Loading this over the networks takes >20 seconds. But if I specify indexChunks, I can just load a single chromosome on the first view, which only requires fetching ~600k of BAI.

This results in a massive speedup.

This does some mild refactoring to bam.js and also adds a few test cases. Testing your behavior in error conditions is super important!

danvk commented 9 years ago

One thing I like about this approach is that it's completely agnostic about how you get the index chunks. Maybe you're getting them from some offline process, maybe you've tacked them onto the end of the BAI file, maybe you've got an auxiliary JSON file.

One thing I don't like is that minBlockIndex field. I don't think it should be necessary. I'm also not sure the calculation of it in dalliance is correct... for the synthetic4 dataset from the dream challenge, I get:

synthetic.challenge.set4.normal.bam.bai, minBlockIndex=1517
synthetic.challenge.set4.tumour.bam.bai, minBlockIndex=65536

I'm surprised they're so different!

dasmoth commented 9 years ago

Thanks!

dasmoth / dalliance

Add "indexChunks" option to BAM sources #121