Closed ericphanson closed 6 months ago
I was able to workaround it with the following code:
julia> function write_variable_length_string_attribute(fid, attr_key::String, attr_value::String)
attr = create_attribute(fid, attr_key, datatype(String), dataspace(String))
v = Vector{UInt8}(attr_value)
GC.@preserve v begin
p = pointer(v)
write_attribute(attr, datatype(String), Ref(p))
end
return nothing
end
write_variable_length_string_attribute (generic function with 1 method)
julia> fid = h5open("test.h5", "w")
🗂️ HDF5.File: (read-write) test.h5
julia> write_variable_length_string_attribute(fid, "attr-key", "attr-value")
julia> close(fid)
shell> h5dump test.h5
HDF5 "test.h5" {
GROUP "/" {
ATTRIBUTE "attr-key" {
DATATYPE H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_UTF8;
CTYPE H5T_C_S1;
}
DATASPACE SCALAR
DATA {
(0): "attr-value"
}
}
}
}
The context here is I need to write a variable-length string as an attribute, so some python code using h5py
will interpret the attribute as a string and not a numpy byte array (xref https://docs.h5py.org/en/stable/strings.html).
I don't know if something specific should be done to ensure null termination. I added a branch here, though the output seems exactly the same:
julia> using HDF5
julia> fid = h5open("test.h5", "w")
🗂️ HDF5.File: (read-write) test.h5
julia> function write_variable_length_string_attribute(fid, attr_key::String, attr_value::String)
attr = create_attribute(fid, attr_key, datatype(String), dataspace(String))
v = Vector{UInt8}(attr_value)
v[end] == 0 || push!(v, 0) # null termination?
GC.@preserve v begin
p = pointer(v)
write_attribute(attr, datatype(String), Ref(p))
end
return nothing
end
write_variable_length_string_attribute (generic function with 1 method)
julia> write_variable_length_string_attribute(fid, "attr-key", "attr-value")
julia> close(fid)
shell> h5dump test.input
h5dump error: unable to open file "test.input"
shell> h5dump test.h5
HDF5 "test.h5" {
GROUP "/" {
ATTRIBUTE "attr-key" {
DATATYPE H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_UTF8;
CTYPE H5T_C_S1;
}
DATASPACE SCALAR
DATA {
(0): "attr-value"
}
}
}
}
We should update the docs to recommend everyone use attrs
attrs(fid)["attr-key"] = "attr-value"
I don't get why this is segfaulting though?
oh, i misunderstood: by default we write them as fixed length strings...
It looks like we have to pass a pointer to a string pointer.
Is
function write_variable_length_string_attribute(fid, attr_key::String, attr_value::String)
attr = create_attribute(fid, attr_key, datatype(String), dataspace(String))
v = Vector{UInt8}(attr_value)
v[end] == 0 || push!(v, 0) # null termination?
GC.@preserve v begin
p = pointer(v)
write_attribute(attr, datatype(String), Ref(p))
end
return nothing
end
safe/legit? It seems to work, but I don't really know what I am doing
Most reliable option is to use cconvert
/unsafe_convert
:
julia> using HDF5
julia> fid = h5open("test.h5", "w")
🗂️ HDF5.File: (read-write) test.h5
julia> attr = create_attribute(fid, "attr-name", datatype(String), dataspace(String))
🏷️ HDF5.Attribute: attr-name
julia> val = Base.cconvert(Cstring, "attr-val") # ensures string is nul-terminated
"attr-val"
julia> GC.@preserve val begin
p = Base.unsafe_convert(Cstring, val)
write_attribute(attr, datatype(String), Ref(p))
end
julia> close(fid)
using
and
Also: the documentation around
create_attribute
is confusing. It says to usewrite_attribute
to actually write the data, but if you don't use the returnedAttribute
object fromcreate_attribute
(instead usingwrite_attribute(fid, "attr-key", "attr-value")
), then it will error and say the attribute already exists.