Open koehlerson opened 1 year ago
In my opinion, if you want to reliably serialize/deserialize arbitrary Julia object then the best is to
Serialization
stdlib.Trying to keep backwards compatibility by writing these cusom serialization functions sounds kind of annoying to me.
I'd imagine to write as a sane default the .toml
files of the current env to the binary output in the same way JLSO does. I'm not sure if I agree fully on the custom serialization part. There are building blocks of the DofHandler
which should be easily abstracted away by e.g. scalars that doesn't change over versions. One particular example I can think of are the interpolations.
It is very likely that I can describe a Lagrange interpolation by a triple of refshape
, order
and dim
which are invariant to changes of Interpolation
. I wouldn't aim for a full backwards compatibility, just deconstructing the structs to julia native/primitive types as much as possible and thereby maximizing the backwards compatibility. I thought that the type remapping part could potentially be used after big breaking changes
I think this would be a nice feature! But I do share Kristoffer's concerns that it might add quite a lot to Ferrite's maintenance load...
From my point of view, (or perhaps as a starting point), the current issue is to save the constraint handler when used with anonymous functions. Are there any nice solutions for this? (xref my question/attempt https://github.com/JuliaIO/JLD2.jl/issues/288#issuecomment-1519076407).
For now I'm creating custom structs/using Returns
(example). But it would be nice to be able to reload without erroring (even if one cannot recover the function)
I think this would be a nice feature! But I do share Kristoffer's concerns that it might add quite a lot to Ferrite's maintenance load...
I'm not entirely sure why that is, because you would provide a custom serialization that is valid for a single Ferrite version (and in best case across versions if they don't break the structs) together with a bundled Manifest and Project toml entry of the current environment
From my point of view, (or perhaps as a starting point), the current issue is to save the constraint handler when used with anonymous functions. Are there any nice solutions for this? (xref my question/attempt JuliaIO/JLD2.jl#288 (comment)). For now I'm creating custom structs/using
Returns
(example). But it would be nice to be able to reload without erroring (even if one cannot recover the function)
I think this issue is at the end of a possible implementation. One should start with a DofHandler
serialization such that you can recover the field information properly. Therefore, you'd need to specify how you decompose Cell
s, Interpolation
s etc. Boundary information would be the last step imo, if the idea is accepted as being worth to implement, which doesn't seem really the case right now.
We might want to think about recommendations for users regarding IO / solution iteration and put them into the docs, possibly with more examples. Related issues in the ecosystem are are https://github.com/Ferrite-FEM/FerriteViz.jl/issues/1 and https://github.com/Ferrite-FEM/Ferrite.jl/issues/828 .
What do you think about a standardized output for HDF5 for all types where it is possible?
julia> h5open("blah.h5", "w") do h5f
h5f["grid/cells"] = grid.cells
end
Not necessary standardized, but at least recommendations or a guideline.
But why not standardize things that are possible
julia> h5open("blah.h5", "w") do h5f
h5f["grid/cells"] = grid.cells
h5f["grid/nodes"] = grid.nodes
for (facesetname,faceset) in grid.facesets
h5f["grid/facesets/$facesetname"] = collect(faceset)
end
end
Seems weird to me that every user has to write something like this in order to export a grid to HDF5
But why not standardize things that are possible
I think this is Pandora's box. What is the format that covers all possible use cases? The code snippet you shows already just works for our internal standard serial grid, but not for the p4est grid (information about the trees is lost) and also not for the distributed grid. If we have multiple solutions on a grid with the same dof handler, how is this handled (e.g. for time series) and so on. If I have multiple grids, this also already breaks down. This is very closely related to the struggle in https://github.com/Ferrite-FEM/FerriteViz.jl/issues/1 which just tried to solve this abstraction layer for a single thing.
If you have a solution to this I am happy to help on the implementation anyway.
Just to be clear for others. What I would like to see is something like this
julia> h5open("blah.h5", "w") do h5f
h5f["grid"] = grid
end
ERROR: Type Array does not have a definite size.
Stacktrace:
which is not possible since HDF5.jl needs isbitstype
or Vectors of primitive or isbitstypes. So, the only way around is to overload setindex!
julia> function Base.setindex!(parent::HDF5.File, val::Grid, path::Union{AbstractString,Nothing}; pv...)
parent["grid/cells"] = grid.cells
parent["grid/nodes"] = grid.nodes
for (facesetname,faceset) in grid.facesets
parent["grid/facesets/$facesetname"] = collect(faceset)
end
end
julia> h5open("blah.h5", "w") do h5f
h5f["grid"] = grid
end
Grid{2, Quadrilateral, Float64} with 400 Quadrilateral cells and 441 nodes
In order to read the things back in you get now a Dict
julia> h5open("blah.h5", "r") do h5f
read(h5f["grid"])
end
Dict{String, Any} with 3 entries:
"nodes" => NamedTuple{(:x,), Tuple{NamedTuple{(:data,), Tuple{NamedTuple{(Symbol("1"), Symbol("2")), Tuple{Float64, Float64}}}}}}[(x = (data = (var"1" = -1.…
"cells" => NamedTuple{(:nodes,), Tuple{NamedTuple{(Symbol("1"), Symbol("2"), Symbol("3"), Symbol("4")), NTuple{4, Int64}}}}[(nodes = (var"1" = 1, var"2" = 2…
"facesets" => Dict{String, Any}("left"=>NamedTuple{(:idx,), Tuple{NamedTuple{(Symbol("1"), Symbol("2")), Tuple{Int64, Int64}}}}[(idx = (var"1" = 1, var"2" = 4)…
which is again somewhat inconvenient and one would ideally do the same as for a isbitstype
in HDF5
julia> h5open("blah.h5", "r") do h5f
read(h5f["grid"],Grid)
end
Then people can save in whatever order/form they want, just a small convenience layer (which isn't even visible/noticeable while using it) in order to make life easier and share code that would be duplicated anyways
In general, I think having some binary exchange format for Ferrite data would be nice!
Just to mention an alternative approach, which can be done from outside Ferrite.jl
without doing type piracy (becomes more useful/practical after #692) would be to define
struct FerriteHDF{G<:AbstractGrid,F<:HDF5.File}
grid::G
h5f::F
end
function FerriteHDF(name::String, grid::AbstractGrid)
h5f = open(name, "w")
h5f ["grid/cells"] = grid.cells
h5f ["grid/nodes"] = grid.nodes
for (facesetname,faceset) in grid.facesets
h5f["grid/facesets/$facesetname"] = collect(faceset)
end
return FerriteHDF(grid, h5f)
end
Base.close(io::FerriteHDF) = close(io.h5f)
# Could also define the do-block, append modes and other conveniences as required....
# And define e.g. `Ferrite.write_celldata(::FerriteHDF, data)`
# Would be even more general if the same was done for the dofhandler instead of the grid...
IMO, doing something like this in an external package could be nice - and if it becomes well-adopted we can consider bringing it in as a weak dep via HDF5.jl (slightly tricky with the type, but I guess that type could be defined outside the extension) Similarly (as previously discussed) could be done for JLD2.jl
Currently, there isn't any interface for reading this back in, but one could define
function FerriteHDF5(path::String, G::Type{<:AbstractGrid})
h5f = h5open(path, "r")
grid = dict_to_grid(G, read(h5f["grid"]))
return FerriteHDF5(grid, h5f)
end
# Similar to above, a do-construct can be defined as well.
# The following methods could then be defined in Ferrite if we have an importer functionality,
# but initially could be defined by the package defining FerriteHDF5...
read_celldata(::FerriteHDF5, key)
(Of course, I'm biased here since I have #692 in my head)
I'm not sure, but why should this be type piracy if we own the type Grid
?
From inside Ferrite it is perfectly fine! (I was referring to doing this in an external package)
I think the choice for having this supported by Ferrite is more that one has to commit to one backend as maintaining multiple might be too much to promise. HDF5 might certainly be a nice choice!
Out of interest, why choosing this over JLD2 as was discussed above? Or would this somehow work naturally with JLD2 since that is hdf5-based?
I received some reviewer comments in a funding proposal that something more established like HDF5 should be used instead of JLD2 (due to its "known problems"). The "known problems" weren't specified, but I think there is some consensus that HDF5 is somewhat reliable and established. Aside that, jld2 leaks quite severely memory (not sure if HDF5 has the same problem)
Thanks - that is interesting to get from a reviewer (even if vague)
At least the current problem with the circular dependency for the DofHandler
could be another argument. But it just feels like a lot of things that "just work" with JLD2 would have to be manually implemented for HDF5. On the other hand, perhaps that would be a more reliable solution...
In the future, I plan to overhaul the IO part of FerriteProblems (which currently uses JLD2), and after that, I might have better views/opinions. But it would be great if there already is a solution to tap into then:)
Do you happen to know if JLD2 allows for the same setindex!
overload? I find it quite nice that you don't need to promise a mapping with HDF5 to a Julia user written composite type but instead, you can rely on primitives or array of primitives. So I like that a "custom serialization" can be done without having a bijective map from julia object to storage and vice verca, but instead you can just map to storage. This can make sense sometimes, e.g. material state structs which may change during development but most likely you will save from the get go the interesting data. Of course, you can have for JLD2 export another struct which holds the interesting data to store, but again, as soon as you change, you have the annoyance of type remapping
Don't know, but doubt it since JLD2.jl doesn't have HDF5.jl as dep. JLD.jl does, maybe it works for that one? But also not sure if this design was causing JLD2.jl to be created as the replacement.
I meant instead of
julia> function Base.setindex!(parent::HDF5.File, val::Grid, path::Union{AbstractString,Nothing}; pv...)
parent["grid/cells"] = grid.cells
parent["grid/nodes"] = grid.nodes
for (facesetname,faceset) in grid.facesets
parent["grid/facesets/$facesetname"] = collect(faceset)
end
end
julia> h5open("blah.h5", "w") do h5f
h5f["grid"] = grid
end
this
julia> function Base.setindex!(parent::JLD2.File, val::Grid, path::Union{AbstractString,Nothing}; pv...)
parent["grid/cells"] = grid.cells
parent["grid/nodes"] = grid.nodes
for (facesetname,faceset) in grid.facesets
parent["grid/facesets/$facesetname"] = collect(faceset)
end
end
julia> jldopen("blah.jld2", "w") do jld
jld["grid"] = grid
end
I meant instead of
julia> function Base.setindex!(parent::HDF5.File, val::Grid, path::Union{AbstractString,Nothing}; pv...) parent["grid/cells"] = grid.cells parent["grid/nodes"] = grid.nodes for (facesetname,faceset) in grid.facesets parent["grid/facesets/$facesetname"] = collect(faceset) end end julia> h5open("blah.h5", "w") do h5f h5f["grid"] = grid end
this
julia> function Base.setindex!(parent::JLD2.File, val::Grid, path::Union{AbstractString,Nothing}; pv...) parent["grid/cells"] = grid.cells parent["grid/nodes"] = grid.nodes for (facesetname,faceset) in grid.facesets parent["grid/facesets/$facesetname"] = collect(faceset) end end julia> jldopen("blah.jld2", "w") do jld jld["grid"] = grid end
The recommended way to do this would be to define a custom serialization: https://juliaio.github.io/JLD2.jl/dev/customserialization/
https://juliaio.github.io/JLD2.jl/dev/advanced/
The advanced docs explain how one can deal with structures that have changed (need custom conversion).
However, if you want to be able to upgrade structures without the user coming into contact with it,
you would probably need to write your own write_to_file(filename, object)
functions
(that then write and read the relevant info of your structures and manually ensure backwards compatibility by adding fallbacks every time the structures change).
If you choose to go down that route, it doesn't matter whether you use JLD2 or HDF5. The files will not differ much.
Aside that, jld2 leaks quite severely memory (not sure if HDF5 has the same problem)
I have not encountered that before. Please file an issue, if you can trigger this.
The recommended way to do this would be to define a custom serialization: https://juliaio.github.io/JLD2.jl/dev/customserialization/
https://juliaio.github.io/JLD2.jl/dev/advanced/ The advanced docs explain how one can deal with structures that have changed (need custom conversion). However, if you want to be able to upgrade structures without the user coming into contact with it, you would probably need to write your own
write_to_file(filename, object)
functions (that then write and read the relevant info of your structures and manually ensure backwards compatibility by adding fallbacks every time the structures change). If you choose to go down that route, it doesn't matter whether you use JLD2 or HDF5. The files will not differ much.
I think I was thinking more about the latter case, because the datastructure Grid
won't change across version so much (or at all), that you cannot reconstruct it from the stuff that I have in the snippet above.
Aside that, jld2 leaks quite severely memory (not sure if HDF5 has the same problem)
I have not encountered that before. Please file an issue, if you can trigger this.
It's quite hard to come up with a reproducer, it happens for me only in simulations that compute for days. Also julia threads error for me in that case after closing the REPL. I hope I can come up with a reproducer at some point.
Or to put it differently: I can think of a lot of cases, where I don't want to have an explicit bijective mapping from storage type to (actual user) julia type. Sometimes I just want to dump data (and not being able to reconstruct the type from which I gathered the data) and read it out as such (NamedTuple
for instance), especially in the beginning of a project, where you do a lot of changes to structs. There, I find the type remapping a little bit annoying because I cannot promise to have the mapping from storage to julia type. Is there some recommend way to save instead of the type only some fields which consists of Julia primitives or arrays of primitives, more or less in the way I did with the setindex!
dispatch for Grid
but in JLD2
@JonasIsensee ? Can I do the same overload without writing a seperate write_to_file
function? Or do you think this is exactly where I should use JLD2 instead of HDF5?
@koehlerson I've thought about this kind of use case myself for quite a bit and must say that I don't have a great solution either. The JLD2 custom serialization interface is great if you have
struct ComplexStruct
# lot's of custom stuff
end
struct SerializedVersion
# important data
# only contains basic types (array / int / float / string )
end
JLD2.writeas(::Type{ComplexStruct}) = SerializedVersion
JLD2.wconvert(::Type{SerializedVersion}, x::ComplexStruct) = # conversion
JLD2.rconvert(::Type{ComplexStruct}, x::SerializedVersion) = # convert back / upgrade
# as long as SerializedVersion is unchanged, ComplexStruct may be changed (updating the conversion accordingly)
# if the serialized version is changed, the old one should be kept and the new one needs a new name
# e.g. SerializedVersionV1
# to ensure full backward compatibility
Caveat: JLD2 stores the type signature of the ComplexStruct
. If this has extensive type parameters (function types, nested parameterized types etc.) this can become a problem.
EDIT:
Given that you keep mentioning setindex!
, you seem to care about the user interface.
It is possible but I'm not sure if overloading setindex!
is really a good idea. You'd have to inject your code into the JLD2 mechanism properly to ensure that file["key"] = (, object)
also does the right thing?
Semantically it seems rather brittle to have JLD2 only do the right conversion thing (your code)
when the Grid
is a toplevel storing call. (not wrapped in some other structure)
It is possible but I'm not sure if overloading setindex! is really a good idea. You'd have to inject your code into the JLD2 mechanism properly to ensure that file["key"] = (, object) also does the right thing?
I'd love to have some intermediate dispatch stuff_to_save
, which is for type Any
an identity mapping and can be dispatched by the user for custom types to the subset of things that HDF5 supports ootb like primitives, array of primitives and isbitstype compound types, such that it stores by default the returned things in a named tuple (or something similar). The setindex!
is more or less just the quick and dirty way to have that in HDF5.jl. I'm not sure how feasible that is or which pitfalls exist along that route but I think it would be nice to have an "optional plain HDF5 data" behavior in JLD2. However, I'm not an expert or anything, it's just the ideal behavior I have in mind after using JLD2 for some years.
Semantically it seems rather brittle to have JLD2 only do the right conversion thing (your code) when the Grid is a toplevel storing call. (not wrapped in some other structure)
I was hoping that setindex!
is called recursively but you are right, if that's not the case this stuff_to_save
call should happen for anything that should be saved, i.e. for each field of each struct.
This sounds like a really nice idea. I think the best way to experiment with this would be to create a new JLD2 variant package that depends on JLD2 but has extra asserts on the types it handles and adds your intermediate conversion level.
One of the things that limits flexibility in JLD2 is that it tries to be type stable to be fast. Building efficient memory layouts for structures also requires quite some knowledge about the types..
Further discussion might be more appropriate in JLD2. I'd be willing to assist in efforts in that direction.
I use JLD2 for saving my simulation results. The two most important pieces that I save are solution vectors and a
DofHandler
. The only annoying part is if structs change they can't be reloaded that easily. However, JLD2 offers a lot of API to cover that by e.g. custom serialization and type remappingWhat do you think about providing a binary export extension for e.g. JLD2 where we specify a custom serialization for the
DofHandler
?