occasional corruption while loading `Matrix{Rational{BigInt}}`

mchristianl commented 9 months ago

Hi, I am observing a corrupt loading of a Matrix{Rational{BigInt}} from a jld2 file every $7n+2$ th time.

An example for the file: RationalBigIntMatrix.tar.gz

Is there any possibility to find out what's going wrong here?

example code:

using JLD2

fname = joinpath(@__DIR__,"RationalBigIntMatrix.jld2")
A0 = Rational{Int64}.(load(fname)["content"]) # this contains a Matrix{Rational{BigInt}}
for _ in 1:50
  if A0 == load(fname)["content"]
    print("✓")
  else
    print("↯")
  end
end

output:

↯✓✓✓✓✓✓↯✓✓✓✓✓✓✓↯✓✓✓✓✓✓↯✓✓✓✓✓✓↯✓✓✓✓✓✓↯✓✓✓✓✓✓↯✓✓✓✓✓✓

Bigger matrices give more ↯

Julia Version 1.8.5
Commit 17cfb8e65ea (2023-01-08 06:45 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
  Threads: 4 on 4 virtual cores
Environment:
  JULIA_LOAD_PATH = /usr/share/gmsh/api/julia/:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 4

JonasIsensee commented 9 months ago

Hi @mchristianl , thank you for this report.

I have tracked the problem down to here: https://github.com/JuliaIO/JLD2.jl/blob/a0d0730c45637f4e982c145a4f353b41603f0f53/src/data/custom_serialization.jl#L46

BigInts are mutable and have a custom finalizer (for freeing the memory of the GMP library). The conversion function copies the memory of mutable structs to a pre-allocated one in order to allow recursively nested structures. This is preciscely the problem here. The finalizer may be triggered when GC runs.

JonasIsensee commented 9 months ago

Defining the following avoids the problem but I hope that there's a more general solution.

function JLD2.jlconvert(::JLD2.ReadRepresentation{BigInt,JLD2.CustomSerialization{String,JLD2.Vlen{String}}}, f::JLD2.JLDFile, ptr::Ptr, header_offset::JLD2.RelOffset)
                 JLD2.rconvert(BigInt, JLD2.jlconvert(JLD2.ReadRepresentation{String, JLD2.Vlen{String}}(), f, ptr, header_offset))
end

mchristianl commented 9 months ago

I can confirm that this fixes the symptoms. Thank you @JonasIsensee ! :partying_face:

Meanwhile I was using the following improvised workaround serializing a Array{Rational{BigInt}} into two Array{Int64} and two Vector{UInt64} "buffers". I am not sure if this is always true for various GMP configurations.

function serialize_bigrat(A0::Array{Rational{BigInt}})
  num_last = Array{Int64}(undef,size(A0))
  den_last = Array{Int64}(undef,size(A0))
  j = 0; for i in eachindex(A0); num_last[i] = (j += abs(A0[i].num.size)); end
  j = 0; for i in eachindex(A0); den_last[i] = (j += abs(A0[i].den.size)); end
  num_buf = Vector{UInt64}(undef,num_last[end])
  den_buf = Vector{UInt64}(undef,den_last[end])
  for i in eachindex(A0)
    x = A0[i]
    @assert -100 ≤ x.num.size ≤ 100 i # safety check
    @assert -100 ≤ x.den.size ≤ 100 i # safety check
    num_last[i] = copysign(num_last[i],x.num.size)
    unsafe_copyto!(pointer(num_buf,abs(num_last[i]) - abs(x.num.size) + 1),x.num.d,abs(x.num.size))
    unsafe_copyto!(pointer(den_buf,abs(den_last[i]) - abs(x.den.size) + 1),x.den.d,abs(x.den.size))
  end
  return (num_last,den_last,num_buf,den_buf)
end
function deserialize_bigrat(num_last::Array{Int64},den_last::Array{Int64},num_buf::Vector{UInt64},den_buf::Vector{UInt64})
  @inline function deserialize_BigInt(x_last,x_buf,i)
    n = x_last[i]
    s = abs(n) - abs(i > 1 ? x_last[i-1] : 0)
    x = BigInt(nbits=64*s)
    @assert x.alloc >= s
    x.size = copysign(s,n)
    unsafe_copyto!(x.d,pointer(x_buf,abs(n)-s+1),s)
    x
  end
  A1 = Array{Rational{BigInt}}(undef,size(num_last))
  for i in eachindex(num_last)
    A1[i] = deserialize_BigInt(num_last,num_buf,i) // deserialize_BigInt(den_last,den_buf,i)
  end
  return A1
end

JuliaIO / JLD2.jl

occasional corruption while loading `Matrix{Rational{BigInt}}` #512