JuliaIO / BSON.jl

Other
158 stars 39 forks source link

BSON file saves on 64bit system can not be loaded on 32bit system #41

Closed oxinabox closed 5 years ago

oxinabox commented 5 years ago

When is an Core.Array{UInt8,1} not a Core.Array{UInt8,1} ?

Answer: When Core.Array{UInt8,1}).uid is not 27.

How does it get that way?

idk exactly but it looks like when BSON deserializing Dict{String, UInt8} that was serialized on a 64 bit julia, on to a 32 bit julia, BSON seems to be creating a new type with the same name in the same module. but a different uid the true type.

This is from https://github.com/invenia/JLSO.jl/pull/11 and is why the AppVeyor 32 bit tests are failing (but are not showing as I said allow failures) but a JSLO file is just a BSON file anyway.

WE:

On 32 bit system only.

using BSON
filename = download("https://github.com/invenia/JLSO.jl/blob/4386519551912c54936ca89e61c4a352e0a2ebf8/test/specimens/v2_julia_serialize_none.jlso?raw=true")

begin
    try
        BSON.load(filename)
    catch err
        global err=err
        @show typeof(err)
        @show err.f
        @show typeof.(err.args)
        @show methods(err.f, map(typeof, err.args))
    end
end

Output:

typeof(err) = MethodError
err.f = Array{UInt8,1}
typeof.(err.args) = (Array{UInt8,1},)
methods(err.f, map(typeof, err.args)) = # 0 methods for generic function "(::Type)":
# 0 methods for generic function "(::Type)":

I think a smaller working example can be created by running on a 64 bit computer

bson("mwe.bson", obj=Dict("name"=>[0x1, 0x2]))

but I have not tested.

oxinabox commented 5 years ago

Addendum: if it wasn't clear it is the Dict's type param that is deserializing wrong

oxinabox commented 5 years ago

The MWE of bson("mwe.bson", obj=Dict("name"=>[0x1, 0x2])) does indeed reproduce this.

And it works either direction. 32 bit can't load 64bit and 64bit can't load 32 bit.

In the attached .zip are files from both 32bit and 64 bit systems that have that MWE. mwes.zip

You just need to call BSON.load on them (on the wrong archetecture) and they will error

Note that: BSON.parse("mwe_from32.bson") == BSON.parse("mwe_from64.bson")

But that

BSON.raise_recursive(BSON.parse("mwe_from32.bson")) errors on a 64 bit system but that other does not.

The fact that the two files BSON.parse to be == and yet one breaks on BSON.load and the other does not…

All I can think is that somewhere a Dict was serializes inproperly *(since 32bit and 64bit) hash differently. and that it is actually then deserializing it into a subltly corrupt state.

I am going to stop trying to debug it for now.

oxinabox commented 5 years ago

AHAHAHA got it. When is Array{UInt8, 1} not a Array{UInt8, 1} ?

When the Int literals have different types.
As in the 1 is actually an Int32/Int64

Short-term hack is to make Integer Literals that occur in type-params be normalized to the Int type of the systems they are running in. idk if that is a complete solution though. I feel like pathological types/methods can exist that can care what the type of there type-params are. (Not directly but via turning them into values then dispatching on that)