Open ckarwin opened 7 months ago
Some context and thoughts I'd have:
The files in wasabi ending in .h5 are actually compressed by zip, just internally. HDF5 uncompressed them on the fly, but only the portion you are accessing.
In general, it wouldn’t be great having to decompress the whole file every time you need to access some of it, as it would be the case if you accept h5.gz files. It is fine for small files, but for large files like this one, especially once we go to finer resolution, it might not be practical.
I found that this on-the-fly decompression was slowing the rotation of the response significantly. That’s why I ended up creating an .h5 without internal compression that can be decompressed once and for all.
I see this as a temporary workaround though. I think there might be a way to write the code to reduce the time spent on the internal decompression. I think right now it is decompressing the same data multiple times. I investigated this briefly, and part of the issue is that the HDF5 cache fills up before we access the same piece of data again.
I don't think that accepting whole zipped files is they way to go, as internal compression is one of the main features of HDF5. Let's leave this issue open to remind us about all of this.
Thanks for the detailed answer. Ok, sounds good.
Is it possible for the response class to take h5.gz or h5.zip files?