HDFGroup / hdf5-json

Specification and tools for representing HDF5 in JSON
https://hdf5-json.readthedocs.io
Other
72 stars 25 forks source link

Support large files #31

Open jreadey opened 8 years ago

jreadey commented 8 years ago

h5tojson.py and jsontoh5.py can't convert files whose size is comparable to the amount of physical memory on the machine the convertor is running on.

jreadey commented 8 years ago

I'm tagging this as an "enhancement" rather than a bug since it was a known limitation of the design.

It may be worthwhile investigating using an alternative json parser such as: https://pypi.python.org/pypi/ijson/.

Would it make more sense to tackle this using a native-C implementation of the conversion tools?

ccoulombe commented 4 years ago

Any work towards this?

jreadey commented 4 years ago

Sort of... In HSDS we use what is basically the hdf5-json schema for metadata, but chunk data is stored as blobs. See: https://github.com/HDFGroup/hsds/blob/master/docs/design/obj_store_schema/obj_store_schema_v2.md for a description. This works pretty well - we've used it for "files" as large as 50 TB. "files' is in quotes since what you get at the end is a large collection of files in a tree structure.

This was done to support the HDF service, but the same approach could be used outside the server.

What type of problem are you looking to solve?