JuliaIO / JLD2.jl

HDF5-compatible file format in pure Julia
Other
553 stars 88 forks source link

Backslash at beginning of key in dictionary not saved #562

Closed jfdev001 closed 3 months ago

jfdev001 commented 3 months ago

Perhaps I am missing something obvious here, but if I have a dictionary whose entries begin with a backslash "/", when saving and subsequently loading the JLD2 file, the backslash is lost. Is this expected behavior?

# file named "example.jl"
using JLD2

path = "/var/"     # path with leading /
annotation = 3000  # arbitrary annotation about the data
d = Dict([path => annotation])
save("tmp.jld2", d)
tmp_d = load("tmp.jld2")
@show keys(d)     # original, ["/var/"]
@show keys(tmp_d) # loaded, ["var/"] <--- no backslash in beginning!!!
@assert all(keys(d) .∈  Ref(keys(tmp_d)))

and the output is

julia> include("example.jl") keys(d) = ["/var/"] keys(tmp_d) = ["var/"] keys(d) .∈ Ref(keys(tmp_d)) = Bool[0] ERROR: LoadError: AssertionError: all(keys(d) .∈ Ref(keys(tmp_d))) Stacktrace: [1] top-level scope @ ~/Dev/misc-julia/FailStringJLD2/example.jl:11 [2] include(fname::String) @ Base.MainInclude ./client.jl:489 [3] top-level scope @ REPL[2]:1

JonasIsensee commented 3 months ago

Hi @jfdev001, saving a dictionary as a via save(file, dict) interprets the keys very similarly to how an operating system uses file paths. Slashes indicate folders / groups. A leading slash refers to the root group which is fully redundant here. The convention upon loading is to not prepend the slash everywhere.

julia> d = Dict("a" => 1, "/b" => 2, "c/d/e" => 3)
Dict{String, Int64} with 3 entries:
  "c/d/e" => 3
  "/b"    => 2
  "a"     => 1

julia> save("test.jld2", d)

julia> f = jldopen("test.jld2")
JLDFile /home/jisensee/test.jld2 (read-only)
 ├─🔢 b
 ├─🔢 a
 └─📂 c
    └─📂 d
       └─🔢 e

If you really want to store a dict with string keys that contain / then, you could use e.g.

julia> save("test.jld2", "dict_dataset", d)

julia> f = jldopen("test.jld2")
JLDFile /home/jisensee/test.jld2 (read-only)
 └─🔢 dict_dataset

julia> f["dict_dataset"]
Dict{String, Int64} with 3 entries:
  "c/d/e" => 3
  "/b"    => 2
  "a"     => 1

Note, that the former logic allows you to load/access parts of the dict individually while the latter does not.