JuliaIO / JLD2.jl

HDF5-compatible file format in pure Julia
Other
549 stars 85 forks source link

save julia objects as a self contained group in a "normal" hdf5 file #398

Closed tkuraku closed 3 months ago

tkuraku commented 2 years ago

This is a feature request that I had when I was interacting with some hdf5 files that were created from a separate program. With jld2 I can't save a Julia object into an existing hdf5 file because it isn't a "JLD2" file. Instead of making the file a JLD2 file, it would be super useful to have each saved Julia object to be a self contained jld2 group so that they could readily be saved into an existing "normal" hdf5 file. This would be awesome for interoperability for files that don't originate from Julia. In the same vain standard data types such as standard arrays could be transparently written and read like how they are in HDF5.jl, and only more complex Julia objects would need to be serialized into a JLD2 group.

JonasIsensee commented 2 years ago

Hi @tkuraku ,

In the same vain standard data types such as standard arrays could be transparently written and read like how they are in HDF5.jl, and only more complex Julia objects would need to be serialized into a JLD2 group.

this is already the case.

On the more general case: to do what you are suggesting, you need to be able to read and edit normal hdf5 files. This is hard. JLD2 does not rely on the hdf5 binary dependency and instead reimplements a part of the hdf5 format spec. (which is huge - lots of bit-fiddling)

Because I also found this lack of cooperability quite frustrating, I created #388 . With this PR it is possible to read a large portion of hdf5 files. However, the format spec does only states what a file needs to look like, to be valid - not how to create it. JLD2 files use the simplest way to create these files and leave out many optimizations. ( groups are really just lists and not funky heap structures) This is reasonably easy to produce and edit. However, to edit just any hdf5 file out there, you would need to implement the algorithms to update the more complex structures.

This is possible but also a significant undertaking.

tkuraku commented 2 years ago

I see, if you don't rely on the hdf5 binary it becomes a very big task indeed. Maybe a alternative for the case when you need cooperability have the jld2 provide a serialization and de-serialization function that can be used in conjunction with HDF5.jl using the hdf5 binaries, and in the case when you don't you can use the jld2 library as is? Maybe something like this:

h5open(data_file, "w") do fd
    fd["julia_object"] = jld2.serialize(my_object)
end

h5open(data_file, "r") do fd
    my_object = jld2.deserialize(fd["julia_object"])
end

Just an idea. Thanks for all your hard work!

pvillacorta commented 9 months ago

Hi, I am really interested in this. I would need to store julia objects (specifically, functions) as a subset of a bigger HDF5 file. Have there been any new developments regarding this? Thank you.

JonasIsensee commented 8 months ago

Hi, I am really interested in this. I would need to store julia objects (specifically, functions) as a subset of a bigger HDF5 file. Have there been any new developments regarding this? Thank you.

functions are the one main thing that JLD2 cannot store. (only by referencing the name...)