cositools / cosipy

The COSI high-level data analysis tools
Apache License 2.0
3 stars 16 forks source link

Reading zipped response files #106

Open ckarwin opened 7 months ago

ckarwin commented 7 months ago

Is it possible for the response class to take h5.gz or h5.zip files?

israelmcmc commented 7 months ago

Some context and thoughts I'd have:

The files in wasabi ending in .h5 are actually compressed by zip, just internally. HDF5 uncompressed them on the fly, but only the portion you are accessing.

In general, it wouldn’t be great having to decompress the whole file every time you need to access some of it, as it would be the case if you accept h5.gz files. It is fine for small files, but for large files like this one, especially once we go to finer resolution, it might not be practical.

I found that this on-the-fly decompression was slowing the rotation of the response significantly. That’s why I ended up creating an .h5 without internal compression that can be decompressed once and for all.

I see this as a temporary workaround though. I think there might be a way to write the code to reduce the time spent on the internal decompression. I think right now it is decompressing the same data multiple times. I investigated this briefly, and part of the issue is that the HDF5 cache fills up before we access the same piece of data again.

I don't think that accepting whole zipped files is they way to go, as internal compression is one of the main features of HDF5. Let's leave this issue open to remind us about all of this.

ckarwin commented 7 months ago

Thanks for the detailed answer. Ok, sounds good.