GMOD / bbi-js

Parser for bigwig and bigbed files
MIT License
8 stars 6 forks source link

reduce the number of header-related reads #32

Closed skinner closed 4 years ago

skinner commented 4 years ago

Hi Colin! We've never spoken, but from what I've seen, you've been doing a good job. I like how you've broken things down, like with this repo.

I've been using bbi-js over HTTP, and I've been noticing more requests than I expected in the dev tools. It's been making multiple requests for the endian check and for the main header read. Those seem redundant, and they're adding to the latency of the data loads (in my hands, at least).

This change makes fewer header requests by reusing data in the getHeader call chain. For the "bigbed file with large header" test, the number of reads goes from 11 to 3. And for the "loads volvox.bb" test, the number of reads goes from 8 to 3.

Also, sort of tangentially, I'd like access to the definedFieldCount header in order to interpret BigBed "rest" lines. I went ahead and included that in this PR since it's just one line but I can split it out if you'd prefer that.

codecov[bot] commented 4 years ago

Codecov Report

Merging #32 into master will decrease coverage by 0.08%. The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #32      +/-   ##
==========================================
- Coverage   79.87%   79.78%   -0.09%     
==========================================
  Files           6        6              
  Lines         467      465       -2     
  Branches       90       90              
==========================================
- Hits          373      371       -2     
  Misses         94       94
Impacted Files Coverage Δ
src/bbi.ts 90.76% <100%> (-0.14%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 13ad253...617a3a5. Read the comment docs.

cmdcolin commented 4 years ago

Wow nice! Glad to go ahead and merge. We use a sort of crazy extra caching layer called http-range-fetcher in the latest jbrowse's and this module is actually not optimized as good as it could be as a result https://www.npmjs.com/package/http-range-fetcher

I am fully open to further work on the fetching strategy in this though and I'm very happy to merge this :)

cmdcolin commented 4 years ago

(this module referring to @gmod/bbi)

skinner commented 4 years ago

Oh! I hadn't seen http-range-fetcher, thanks for the pointer. I'm going to think about that a bit; it's not immediately obvious to me whether (a) request range merging/caching is better done as a generic layer like http-range-fetcher, or (b) the more-specific layer (like @gmod/bbi) can be smarter about choosing request boundaries given its deeper knowledge about the structure of the file and earlier knowledge about the genomic region it's loading. Or if (b) is a win, whether or not it's worth the work/complexity to get there.

Pragmatically, this PR and http-range-fetcher probably scratch my immediate itch though. Thanks!

cmdcolin commented 4 years ago

It is indeed a weird balance whether to make the library smart or just have this sort of crazy global cache. The crazy global cache has worked pretty well but I think it is worth it for making this library work well as a standalone and I don't necessarily expect everyone to use http-range-fetcher

cmdcolin commented 4 years ago

Went ahead and published a 1.0.27 with this change