Closed denizyuret closed 10 months ago
JLD2 has never really been supported. I guess the fact it worked was just sheer luck? In any case, I'm not familiar with JLD2, so I'll defer to anybody who is to take a look 🙂
Hi @denizyuret,
from the perspective of JLD2
your code looks absolutely ok.
What versions are you on? I can't reproduce the problem.
[052768ef] CUDA v3.13.1 # (haven't upgraded to 4.x yet, but if it solves the JLD2 issue I will)
[5789e2e9] FileIO v1.16.0
[033835bb] JLD2 v0.4.31
julia> CUDA.versioninfo()
CUDA toolkit 11.7, artifact installation
NVIDIA driver 470.57.2, for CUDA 11.4
CUDA driver 11.7
Libraries:
- CUBLAS: 11.10.1
- CURAND: 10.2.10
- CUFFT: 10.7.2
- CUSOLVER: 11.3.5
- CUSPARSE: 11.7.3
- CUPTI: 17.0.0
- NVML: 11.0.0+470.57.2
- CUDNN: 8.30.2 (for CUDA 11.5.0)
- CUTENSOR: 1.4.0 (for CUDA 11.5.0)
Toolchain:
- Julia: 1.8.5
- LLVM: 13.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86
Alas my hope was shortlived :( I get the same error with CUDA v4.1.2
I still can't reproduce your error. (I tried julia 1.8.5 and 1.9.0-rc1 with CUDA 3.13.1 and JLD2 v0.4.31)
Can you send your CUDA.versioninfo so I can see what the difference may be? (library/driver version, gpu type etc could be a factor?)
julia> CUDA.versioninfo()
CUDA toolkit 11.7, artifact installation
NVIDIA driver 515.86.1, for CUDA 11.7
CUDA driver 11.7
Libraries:
- CUBLAS: 11.10.1
- CURAND: 10.2.10
- CUFFT: 10.7.2
- CUSOLVER: 11.3.5
- CUSPARSE: 11.7.3
- CUPTI: 17.0.0
- NVML: 11.0.0+515.86.1
Downloaded artifact: CUDNN
- CUDNN: 8.30.2 (for CUDA 11.5.0)
Downloaded artifact: CUTENSOR
- CUTENSOR: 1.4.0 (for CUDA 11.5.0)
Toolchain:
- Julia: 1.8.5
- LLVM: 13.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86
I tried JLD2.writeas(), JLD2.wconvert(), and JLD2.rconvert() as you suggested. Now I get the following error message:
AssertionError: refcount != 0
Stacktrace:
[1] _derived_array
@ ~/.julia/packages/CUDA/BbliS/src/array.jl:729 [inlined]
[2] reshape(a::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}, dims::Tuple{Int64})
@ CUDA ~/.julia/packages/CUDA/BbliS/src/array.jl:723
[3] reshape
@ ./reshapedarray.jl:117 [inlined]
[4] vec(a::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer})
@ Base ./abstractarraymath.jl:41
[5] (::RNN)(x::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}; batchSizes::Nothing)
@ Knet.Ops20 ~/.julia/packages/Knet/YIFWC/src/ops20/rnn.jl:332
[6] (::RNN)(x::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer})
@ Knet.Ops20 ~/.julia/packages/Knet/YIFWC/src/ops20/rnn.jl:329
[7] (::Chain)(x::Matrix{UInt16})
@ Main ./In[5]:6
[8] tag(tagger::Chain, s::String)
@ Main ./In[29]:6
[9] top-level scope
@ In[30]:1
What is "refcount"? What purpose does it serve? How can one alter its value, if altering it is necessary? You do say above: "they seem similar except for the refcount." Can you elaborate on it?
Finally, if I assign the value read to a global variable in rconvert it works without any errors: julia> JLD2.rconvert(::Type{CuArray{T,N,D}}, x::JLD2CuArray{T,N}) where {T,N,D} = (y=CuArray(x.array); global dbg=y; y) julia> d = FileIO.load("foo.jld2") julia> d["a"] # works with no problems
This here (and also the refcount ) makes me think that this is a problem with the memory management when creating the CuArray. JLD2 allocates the underlying array and passes it to the CuArray(data)
constructor and then ceases to keep track of it. (leading to refcount = 0).
This would explain, why the global scope thing could fix it.
@denizyuret Could you try a few functions of this type?
function f()
data = rand(10,10)
CuArray(data)
end
@denizyuret Could you try a few functions of this type?
The f() function you suggested works without problems. refcount of the resulting array is 1.
JLD2 allocates the underlying array and passes it to the CuArray(data) constructor and then ceases to keep track of it. (leading to refcount = 0).
CuArray copies the contents of data (stored in RAM) to the GPU memory, and once the GPU array is constructed I don't think it cares about what happens to the RAM array. But I am not sure what refcount is for and how it is set, so I may be talking nonsense. If I change the value of refcount manually to 0, things don't break for example.
@maleadt any idea how refcount=0 may appear and whether it may be the source of our problems?
But I am not sure what refcount is for and how it is set, so I may be talking nonsense.
The refcount field is to keep track of the underlying buffer, so that multiple CuArrays can share the same memory (e.g., when you take a view, or reinterpret an array, or reshape it).
refcount=0 may happen when you're serializing a freed array.
The refcount field is to keep track of the underlying buffer, so that multiple CuArrays can share the same memory (e.g., when you take a view, or reinterpret an array, or reshape it).
refcount=0 may happen when you're serializing a freed array.
Thank you for this info. It is a bit odd, though. The problem here is most certainly during deserialization. (Otherwise the workarounds above couldn't work)
Hmm, I was misunderstanding how JLD serializes object. If we're really just calling Array(...)
and CuArray(...)
(i.e., not serializing CuArray
objects directly), I fail to see how we would ever run into refcount=0
. FWIW, I also can't reproduce this issue.
Yeah, that's the curious bit. Let me summarize it quickly:
JLD2
attempts to serialize structs by going through its fields. This fails for CuArray
since they don't actually contain the datausing CUDA
import JLD2, FileIO
struct JLD2CuArray{T,N}; array::Array{T,N}; end
JLD2.writeas(::Type{CuArray{T,N,D}}) where {T,N,D} = JLD2CuArray{T,N}
JLD2.wconvert(::Type{JLD2CuArray{T,N}}, x::CuArray{T,N,D}) where {T,N,D} = JLD2CuArray(Array(x))
JLD2.rconvert(::Type{CuArray{T,N,D}}, x::JLD2CuArray{T,N}) where {T,N,D} = CuArray(x.array)
We define a struct JLD2CuArray
that contains data that JLD2 can safely store, along with convert methods for both directions. (rconvert
and wconvert
- Base.convert
also works but that is risky with invalidations...)
When you give JLD2
any object, it always asks JLD2.writeas
what type to store it as (default writeas(::T) where T = T
)
and it will then call the conversion methods as necessary.
Therefore, with this code, we store the data in Array
form AND the full CuArray{T,N,D}
type signature (not shown) to call the correct rconvert
method upon loading.
The fact that the deserialized object contains a different buffer pointer indicates that the rconvert
function has run. This seems to point to a GC-related issue, but if JLD2 is just storing the deserialized object in a regular dictionary the finalizer shouldn't ever run.
@denizyuret since only you seem to be able to reproduce this, I'd add some logging to the CuArray finalizer that decrements the refcount, to see when and from where it gets run (e.g. by adding sprint(Base.show_backtrace, backtrace())
or so to your log messages).
Here is what I do to be able to save/load CuArrays with JLD2 files:
This used to work with CuArray{T,N} but no longer works with CuArray{T,N,D}. Here is the error I get:
When I compare the original array with the loaded version they seem similar except for the refcount:
Finally, if I assign the value read to a global variable in rconvert it works without any errors: