Open rly opened 6 months ago
Are you talking about loading multiple zarr/hdf5 chunks in parallel? Or loading a single large zarr/hdf5 chunk more efficiently using multiple threads?
If it's the former:
Right now the slicing is passed on to zarr, which I believe does not do multi-threaded reads. Are you suggesting that we specially handle slicing in a special way in LindiH5pyDataset?
The relevant code would be here
Would we look at the argument being passed in to the slicing in order to see if this could be parallelized, and then spawn multiple threads in this function? I think this would get complicated with all the various slicing possibilities.
If it's the latter: This could be done, but it's unclear to me how much gain we could achieve. Depends on size of chunks. If ~10 MB, then I don't think there's much to gain using multiple threads - in fact there will probably be a loss of efficiency.
Are you talking about loading multiple zarr/hdf5 chunks in parallel?
Yes.
Right now the slicing is passed on to zarr, which I believe does not do multi-threaded reads. Are you suggesting that we specially handle slicing in a special way in LindiH5pyDataset? Would we look at the argument being passed in to the slicing in order to see if this could be parallelized, and then spawn multiple threads in this function? I think this would get complicated with all the various slicing possibilities.
I see. Yeah, that would be challenging.
In exploring this, I found a different possible solution:
Zarr can use fsspec's ReferenceFileSystem
which is an AsyncFileSystem
which seems to support concurrent fetches of chunks: https://filesystem-spec.readthedocs.io/en/latest/async.html
We could defer calls for __getitem__
and getitems
to fsspec's ReferenceFileSystem
using a zarr FSStore
. I think since our RFS spec does not really differ from the original RFS spec, we could also just replace LindiReferenceFileSystemStore
with FSStore
configured with a ReferenceFileSystem
(and move the static methods). I haven't tested that yet, but with the former, I see a speedup of about 1.7X for one particular set of reads. I'll push a draft implementation of this in a branch.
We could also roll our own by overriding getitems
in LindiReferenceFileSystemStore
which is currently a simple loop that calls __getitem__
for each selection:
https://github.com/zarr-developers/zarr-python/blob/19365e20522e5d67b0b14698a7c6c953c450c5c6/src/zarr/v2/_storage/store.py#L142
@rly I think we should try to "roll our own" because lindi takes care of authenticating URLs of embargoed dandisets. Shall I take a crack at that?
Ah, I see. Yeah, that would be great.
Currently Remfile reads a large number of chunks faster than lindi. I think it is because Remfile uses multi-threading for requests. It would be nice to add that here for reading large numbers of chunks efficiently.