JuliaIO / BSON.jl

Other
158 stars 39 forks source link

Cannot save array with undefined values #43

Open DilumAluthge opened 5 years ago

DilumAluthge commented 5 years ago

I am opening this issue so we can keep track of our progress in solving this bug.

Currently, BSON is unable to save an array that has one or more undefined values. Here is the minimum working example:

using BSON
x = Array{String, 1}(undef, 5)
x[1] = "a"
x[4] = "d"
println(x)
bson("test.bson", Dict(:x => x))

Expected behavior: BSON saves the data as requested.

Actual behavior: the following error is thrown:

ERROR: UndefRefError: access to undefined reference
Stacktrace:
 [1] lower at /Users/dilum/.julia/packages/BSON/XPZLD/src/extensions.jl:66 [inlined]
 [2] _lower_recursive(::Array{String,1}, ::IdDict{Any,Any}, ::Array{Any,1}) at /Users/dilum/.julia/packages/BSON/XPZLD/src/write.jl:62
 [3] (::getfield(BSON, Symbol("##7#11")){IdDict{Any,Any},Array{Any,1}})(::Array{String,1}) at /Users/dilum/.julia/packages/BSON/XPZLD/src/write.jl:62
 [4] applychildren!(::getfield(BSON, Symbol("##7#11")){IdDict{Any,Any},Array{Any,1}}, ::Dict{Symbol,Any}) at /Users/dilum/.julia/packages/BSON/XPZLD/src/BSON.jl:21
 [5] _lower_recursive(::Dict{Symbol,Array{String,1}}, ::IdDict{Any,Any}, ::Array{Any,1}) at /Users/dilum/.julia/packages/BSON/XPZLD/src/write.jl:62
 [6] lower_recursive(::Dict{Symbol,Array{String,1}}) at /Users/dilum/.julia/packages/BSON/XPZLD/src/write.jl:73
 [7] bson(::IOStream, ::Dict{Symbol,Array{String,1}}) at /Users/dilum/.julia/packages/BSON/XPZLD/src/write.jl:81
 [8] #14 at /Users/dilum/.julia/packages/BSON/XPZLD/src/write.jl:83 [inlined]
 [9] #open#310(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::getfield(BSON, Symbol("##14#15")){Dict{Symbol,Array{String,1}}}, ::String, ::Vararg{String,N} where N) at ./iostream.jl:375
 [10] open at ./iostream.jl:373 [inlined]
 [11] bson(::String, ::Dict{Symbol,Array{String,1}}) at /Users/dilum/.julia/packages/BSON/XPZLD/src/write.jl:83
 [12] top-level scope at none:0
Codyk12 commented 5 years ago

It seems the culprit is the ... operator. More specifically the code Any[x...] on line ([1] lower at /Users/.../.julia/packages/BSON/XPZLD/src/extensions.jl:66)

This is where it tries to unpack the values of the array into a new array, but this doesn't work on arrays with '#undef' in them.

This behavior works with a different type of array though:

using BSON x = Array{Int, 1}(undef, 5) x[1] = 1 x[4] =3 println(x) bson("test.bson", Dict(:x => x))

BSON is able to save as expected because x gets initialized with zeros so the ... operator has something to unpack, where as Array{String, 1}(undef, 5) is initialized with #undef, so the ... operator breaks.

This seems like an even more basic issue and I see it as two solutions:

DilumAluthge commented 5 years ago
  • Change the functionality of the ... operator to be able to pass along undef somehow. (This option seems much more generalizable and the right thing to do.

@MikeInnes and I experimented with some code that bypassed the ... splatting step. We still ran into an issue writing and reading the #undef elements. So fixing the splatting won't solve this issue.

Change how UndefInitializer() acts with arrays of type String to generate empty string, or something concrete.

Certainly that would work, but would require a breaking change in the Julia language, and I'm not sure if there is support for that.

Codyk12 commented 5 years ago

Does that mean someone just needs to find a way to read and write #undef elements? Is that the main fix that needs to happen with this issue?

DilumAluthge commented 5 years ago

Yep!

Codyk12 commented 5 years ago

@DilumAluthge, So if this is fixed you and @MikeInnes can patch the splat operator issue? Because for this whole problem to be solved that will also need to be fixed.

Also have you worked on looking at the underlying issue of reading and writing #undef? I don't want to reinvent the wheel if you guys already have good leads.

DilumAluthge commented 5 years ago

@DilumAluthge, So if this is fixed you and @MikeInnes can patch the splat operator issue? Because for this whole problem to be solved that will also need to be fixed.

Yep! Take a look here at an example of how we can replace the splat operator with a function collect_any. So that patch is already written.

Also have you worked on looking at the underlying issue of reading and writing #undef? I don't want to reinvent the wheel if you guys already have good leads.

You can take a look at Mike's work in https://github.com/MikeInnes/BSON.jl/pull/8. In PR #8, Mike wrote nothings in place of #undef. Unfortunately, this ends up giving you the wrong eltype. For example, you'd end up with an Array{Union{Nothing, String}} instead of the desired Array{Nothing}. So I don't think that writing nothings in place of #undef is a good solution.

I think the best solution is going to be to write a unique sentinel in place of the #undef values. The BSON specification actually reserves \x06 for "undefined" values, so I think we should use that. (Technically it's now deprecated, but as Mike pointed out in another thread, it's okay for us to deviate a little from the spec. We are not striving for 100% compatibility with other implementations of BSON.)

Codyk12 commented 5 years ago

47 Hopefully should take are of these issues. @MikeInnes

AzamatB commented 4 years ago

Bump. Was bitten by this