Add multi-threading like in remfile

rly commented 6 months ago

Currently Remfile reads a large number of chunks faster than lindi. I think it is because Remfile uses multi-threading for requests. It would be nice to add that here for reading large numbers of chunks efficiently.

magland commented 6 months ago

Are you talking about loading multiple zarr/hdf5 chunks in parallel? Or loading a single large zarr/hdf5 chunk more efficiently using multiple threads?

If it's the former:

Right now the slicing is passed on to zarr, which I believe does not do multi-threaded reads. Are you suggesting that we specially handle slicing in a special way in LindiH5pyDataset?

The relevant code would be here

https://github.com/NeurodataWithoutBorders/lindi/blob/85e7897964c99c3dec2ce02e5835f3756be740ce/lindi/LindiH5pyFile/LindiH5pyDataset.py#L167-L219

Would we look at the argument being passed in to the slicing in order to see if this could be parallelized, and then spawn multiple threads in this function? I think this would get complicated with all the various slicing possibilities.

If it's the latter: This could be done, but it's unclear to me how much gain we could achieve. Depends on size of chunks. If ~10 MB, then I don't think there's much to gain using multiple threads - in fact there will probably be a loss of efficiency.

rly commented 5 months ago

Are you talking about loading multiple zarr/hdf5 chunks in parallel?

Yes.

Right now the slicing is passed on to zarr, which I believe does not do multi-threaded reads. Are you suggesting that we specially handle slicing in a special way in LindiH5pyDataset? Would we look at the argument being passed in to the slicing in order to see if this could be parallelized, and then spawn multiple threads in this function? I think this would get complicated with all the various slicing possibilities.

I see. Yeah, that would be challenging.

In exploring this, I found a different possible solution: Zarr can use fsspec's ReferenceFileSystem which is an AsyncFileSystem which seems to support concurrent fetches of chunks: https://filesystem-spec.readthedocs.io/en/latest/async.html

We could defer calls for __getitem__ and getitems to fsspec's ReferenceFileSystem using a zarr FSStore. I think since our RFS spec does not really differ from the original RFS spec, we could also just replace LindiReferenceFileSystemStore with FSStore configured with a ReferenceFileSystem (and move the static methods). I haven't tested that yet, but with the former, I see a speedup of about 1.7X for one particular set of reads. I'll push a draft implementation of this in a branch.

rly commented 5 months ago

We could also roll our own by overriding getitems in LindiReferenceFileSystemStore which is currently a simple loop that calls __getitem__ for each selection: https://github.com/zarr-developers/zarr-python/blob/19365e20522e5d67b0b14698a7c6c953c450c5c6/src/zarr/v2/_storage/store.py#L142

magland commented 5 months ago

@rly I think we should try to "roll our own" because lindi takes care of authenticating URLs of embargoed dandisets. Shall I take a crack at that?

rly commented 5 months ago

Ah, I see. Yeah, that would be great.

NeurodataWithoutBorders / lindi

Add multi-threading like in remfile #78