DC-analysis / dclab

Python library for the post-measurement analysis of real-time deformability cytometry (RT-DC) data sets
https://dclab.readthedocs.io
Other
10 stars 12 forks source link

New file format based on S3 #213

Closed paulmueller closed 1 year ago

paulmueller commented 1 year ago

Since h5py 3.2 there is the "ros3" file driver, which basically allows you to access (and slice!) HDF5 files hosted on S3-compatible instances.

It would be great to have support for those files in dclab, especially since there is a lot of middle-ware for the DCOR file format.

See also https://github.com/DCOR-dev/DCOR-help/issues/4 for further implications.

paulmueller commented 1 year ago

The only obstacle appears to be that you need HDF5 built with S3 support, as mentioned in the release notes: https://docs.h5py.org/en/stable/whatsnew/3.2.html#what-s-new-in-h5py-3-2

paulmueller commented 1 year ago

The workaround for this obstacle is to use s3fs with default_block_size=2048 (a low number to make reading metadata fast).