GMOD / bbi-js

Parser for bigwig and bigbed files
MIT License
8 stars 6 forks source link

Improve caching behavior #55

Open tuner opened 1 year ago

tuner commented 1 year ago

Hi,

I'm loading BigWig data for the whole genome, i.e., calling getFeatures for each chromosome. This results in many fetch requests, as expected.

GC

However, the number of requests seems to be excessive (76), and many of them hit exactly the same range. For example, in the above case (https://genomespy.app/docs/grammar/data/lazy/#example_1), there are 25 requests hitting the same 49 byte range and other 25 requests hitting a same 8197 range. Because web browsers seem to be very bad at caching partial content, this results in quite a bit of latency.

There appears to be a caching mechanism in BlockView, but a new BlockView (and cache) is created for each getFeatures call. https://github.com/GMOD/bbi-js/blob/d239d409d1d7b2e62710afccb8c70134eadef50b/src/block-view.ts#L158

Instead of having a new cache for each BlockView, could there be a single shared cache in the BBI class, which could be used by all BlockViews? In my example case, the number of requests would drop from 76 to 28.

I could make a PR at some point if this change is feasible and would not cause any undesired adverse effects.

cmdcolin commented 1 year ago

I would definitely be open to improvements here. We use https://github.com/rbuels/http-range-fetcher which smooths over some issues like this (it is a special fetch implementation that tries to combine multiple fetch requests and cache results, it was especially useful for cram-js iirc) but would be interested in making the default experience better too

tuner commented 1 year ago

Ok! I'll do some experiments and make a PR if my proposal appears to work.