Closed Jerboa-app closed 1 year ago
The issue appears to be that it is a SubString
, which aren't NUL-terminated:
julia> using HDF5
julia> data = h5open("t.h5","w")
🗂️ HDF5.File: (read-write) t.h5
julia> create_group(data,view("abcdef",1:3))
📂 HDF5.Group: /abcdef (file: t.h5)
Specifically, I think the issue is that we pass the string as Ptr{UInt8}
instead of Cstring
:
https://github.com/JuliaIO/HDF5.jl/blob/566698bfebb29d2b1af21a05153270a7d28923ff/src/api/functions.jl#L1770
Ah that might explain things, then It just continue reading?
Specifically, I think the issue is that we pass the string as
Ptr{UInt8}
instead ofCstring
:
Basically, if we had Cstring
, then Base.cconvert
would convert the SubString
to a String
.
julia> Base.cconvert(Cstring, view("abcdefg",1:3))
"abc"
julia> Base.cconvert(Cstring, view("abcdefg",1:3)) |> typeof
String
That said we probably should not accept an AbstractString
if we are not handling this properly.
https://github.com/JuliaLang/julia/blob/a1b546ad04612272787d9ad70444ebe2dc58ac9f/base/c.jl#L195-L198
cconvert(::Type{Cstring}, s::String) = s
cconvert(::Type{Cstring}, s::AbstractString) =
cconvert(Cstring, String(s)::String)
This is better handled by using Cstring
in the ccall type signature.
When I try to
create_group
using a string, taken from a csv,raw[1,1]
, read via readdlm, HDF5.jl seems to use the entire csv file (as one long string) as the groups keyBasic example:
Even though raw[1,1] is "a" the key is the entire csv file
But if I hard code the same data it uses the single value
"a"
If instead I convert to a string
It works fine.
Only difference I see is raw is
and
but
Is there a reason for this?
Found this when a group key in my dataset was a 1,214,405 line (30 MB) string!
Thanks in advance!
env