Open rly opened 6 months ago
@rly
I think what's going on here...
h5py can read partial chunks - and in this case there is no compression so this is possible
whereas lindi/zarr is set up to always read entire chunks
According to the lindi.json file, the chunk size is [13653, 384]
Maybe this is a zarr limitation/constraint/feature?
Ah, that makes sense. After changing the slice size to equal the chunk size, lindi is now only ~2x the speed of remfile. In inspecting the execution, it looks like zarr makes the request for key acquisition/ElectricalSeriesAp/data/0.0
twice. I'm trying to figure out why.
But also in digging through the Zarr code, I found that Zarr might be able to support partial reads: https://github.com/zarr-developers/zarr-python/blob/b1f4c509abaee1cb8dec18e3a973e1199226011a/src/zarr/v2/core.py#L2054-L2095
Right now, execution is going through the else
because "get_partial_values" is not an attribute of LindiReferenceFileSystemStore
.
Ah. It will be good to figure out whether the duplicate request can be avoided... and/or whether we should implement some caching for this type of situation.
Do you think we should set the get_partial_values attribute somehow?
Do you think we should set the get_partial_values attribute somehow?
Yeah, I think that would be nice, but not urgent. For most large reads, I think it would not make a big difference because the read will be mostly full chunks and some part of a chunk on each axis. And most big datasets are compressed.
If you have time, it would be great if you can take a look but no pressure. Otherwise, I'll try to take a look at it next week.
Makes sense. I'm not going to work on it right now.
Using
remfile
as below:Takes 0.2 seconds on my laptop.
Using
lindi
as below:Takes 2.4 seconds on my laptop.
The data chunk size is (13653, 384) with no compression. Nothing stands out in the LINDI JSON. I'm not sure if I am doing something wrong or if there is an efficiency somewhere in the system.
I'll start looking into it. @magland, do you have any ideas about what might be going on?