Spurious failures in cudacopy! with "invalid argument" error

ikirill commented 9 years ago

This is probably related to #17 and how finalizers work.

The following function:

function test1()
  devices(dev->true, nmax=1) do devlist
    dev = devlist[1]
    device(dev)

    sz = (801, 802)
    x = CudaPitchedArray[]
    for i=1:10
      push!(x, CudaPitchedArray(Float32, sz))
    end
    for i=1:10
      @time for j=1:length(x)
        to_host(x[j])
      end
      println("Finished iteration $i.")
    end
  end
end

produces the following error on the second time it is run. If I run gc() in between the two runs, there is no error.

julia> versioninfo()
Julia Version 0.4.0-dev+2876
Commit f164ac1 (2015-01-22 22:58 UTC)
Platform Info:
  System: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

julia> Test.test1()
elapsed time: 0.014716371 seconds (25716080 bytes allocated)
Finished iteration 1.
elapsed time: 0.071074413 seconds (25716080 bytes allocated, 74.17% gc time)
Finished iteration 2.
elapsed time: 0.010117429 seconds (25716080 bytes allocated)
Finished iteration 3.
elapsed time: 0.063922907 seconds (25716080 bytes allocated, 81.96% gc time)
Finished iteration 4.
elapsed time: 0.010150908 seconds (25716080 bytes allocated)
Finished iteration 5.
elapsed time: 0.06263613 seconds (25716080 bytes allocated, 83.62% gc time)
Finished iteration 6.
elapsed time: 0.010153057 seconds (25716080 bytes allocated)
Finished iteration 7.
elapsed time: 0.062723099 seconds (25716080 bytes allocated, 83.70% gc time)
Finished iteration 8.
elapsed time: 0.06263111 seconds (25716080 bytes allocated, 83.61% gc time)
Finished iteration 9.
elapsed time: 0.01013381 seconds (25716080 bytes allocated)
Finished iteration 10.

julia> Test.test1()
WARNING: CUDA error triggered from:

 in checkerror at /***/.julia/v0.4/CUDArt/src/libcudart-6.5.jl:15
 in cudacopy! at /***/.julia/v0.4/CUDArt/src/arrays.jl:313
 in cudacopy! at /***/.julia/v0.4/CUDArt/src/arrays.jl:288
 in copy! at /***/.julia/v0.4/CUDArt/src/arrays.jl:282
 in to_host at /***/.julia/v0.4/CUDArt/src/arrays.jl:87
 in anonymous at /***/test.jl:17
 in devices at /***/.julia/v0.4/CUDArt/src/device.jl:57
 in devices at /***/.julia/v0.4/CUDArt/src/device.jl:49
 in test1 at /***/test.jl:6ERROR: "invalid argument"
 in checkerror at /***/.julia/v0.4/CUDArt/src/libcudart-6.5.jl:16
 in cudacopy! at /***/.julia/v0.4/CUDArt/src/arrays.jl:313
 in cudacopy! at /***/.julia/v0.4/CUDArt/src/arrays.jl:288
 in copy! at /***/.julia/v0.4/CUDArt/src/arrays.jl:282
 in to_host at /***/.julia/v0.4/CUDArt/src/arrays.jl:87
 in anonymous at /***/test.jl:17
 in devices at /***/.julia/v0.4/CUDArt/src/device.jl:57
 in devices at /***/.julia/v0.4/CUDArt/src/device.jl:49
 in test1 at /***/test.jl:6

This is with the git head version of CUDArt, and probably has something to do with a garbage collection pass trying to collect a Cuda pointer that came from a previous device context (before device_reset called cudaDeviceReset), so that the pointer is invalid in the new device context.

This is very irritating when testing Cuda code in the repl when the same function is run over and over again, sometimes not even correctly, so resetting everything correctly is a must.

timholy commented 9 years ago

I can't replicate this: I ran test1() 120 times in a row without a single error. It also doesn't seem to really make sense, because all the finalizers should run at the end of test1---when the devices context closes, it runs a device_reset (https://github.com/JuliaGPU/CUDArt.jl/blob/35e9e4922eba7ed4bd3154b520e6716f345195c1/src/device.jl#L90-L97). So that should be the equivalent of running gc().

julia> versioninfo()
Julia Version 0.4.0-dev+2954
Commit 8e1e310* (2015-01-28 14:07 UTC)
Platform Info:
  System: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

Things to try:

examine the contents of CUDArt.cuda_ptrs between runs
set debugMemory = true (it's been long enough that I don't exactly remember what this does, but I remember it being useful) https://github.com/JuliaGPU/CUDArt.jl/blob/35e9e4922eba7ed4bd3154b520e6716f345195c1/src/arrays.jl#L15

ikirill commented 9 years ago

Before the failure it looks like this (I printed cuda_ptrs and the copy parameters in cudacopy!). I find it reliably reproducible, but it disappears when I use copy! instead of to_host.

Finished iteration 2.
CUDArt.cuda_ptrs = Dict{Any,Int64}(Ptr{Void} @0x0000004200100000=>0,Ptr{Void} @0x0000004200c00000=>0,Ptr{Void} @0x0000004200940000=>0,Ptr{Void} @0x00000042003c0000=>0,Ptr{Void} @0x0000004201180000=>0,Ptr{Void} @0x0000004201440000=>0,Ptr{Void} @0x0000004201700000=>0,Ptr{Void} @0x00000042019c0000=>0,Ptr{Void} @0x0000004200680000=>0,Ptr{Void} @0x0000004200ec0000=>0)
params = [rt.cudaMemcpy3DParms(C_NULL,srcpos,pitchedptr(src),C_NULL,dstpos,pitchedptr(dst),ext,cudamemcpykind(dst,src))] = [CUDArt.CUDArt_gen.cudaMemcpy3DParms(Ptr{Void} @0x0000000000000000,CUDArt.CUDArt_gen.cudaPos(0x0000000000000000,0x0000000000000000,0x0000000000000000),CUDArt.CUDArt_gen.cudaPitchedPtr(Ptr{Void} @0x0000004200100000,0x0000000000000e00,0x0000000000000321,0x0000000000000322),Ptr{Void} @0x0000000000000000,CUDArt.CUDArt_gen.cudaPos(0x0000000000000000,0x0000000000000000,0x0000000000000000),CUDArt.CUDArt_gen.cudaPitchedPtr(Ptr{Void} @0x000000000d4a8ff0,0x0000000000000c84,0x0000000000000321,0x0000000000000322),CUDArt.CUDArt_gen.cudaExtent(0x0000000000000c84,0x0000000000000322,0x0000000000000001),0x00000002)]
WARNING: CUDA error triggered from:

 in checkerror at /***/.julia/v0.4/CUDArt/src/libcudart-6.5.jl:15
 in cudacopy! at /***/.julia/v0.4/CUDArt/src/arrays.jl:100
 in cudacopy! at /***/.julia/v0.4/CUDArt/src/arrays.jl:288
 in copy! at /***/.julia/v0.4/CUDArt/src/arrays.jl:282
 in to_host at /***/.julia/v0.4/CUDArt/src/arrays.jl:87
 in anonymous at /***/test.jl:53
 in devices at /***/.julia/v0.4/CUDArt/src/device.jl:67
 in devices at /***/.julia/v0.4/CUDArt/src/device.jl:59
 in test1 at /***/test.jl:7ERROR: "invalid argument"
 in checkerror at /***/.julia/v0.4/CUDArt/src/libcudart-6.5.jl:16
 in cudacopy! at /***/.julia/v0.4/CUDArt/src/arrays.jl:100
 in cudacopy! at /***/.julia/v0.4/CUDArt/src/arrays.jl:288
 in copy! at /***/.julia/v0.4/CUDArt/src/arrays.jl:282
 in to_host at /***/.julia/v0.4/CUDArt/src/arrays.jl:87
 in anonymous at /***/test.jl:53
 in devices at /***/.julia/v0.4/CUDArt/src/device.jl:67
 in devices at /***/.julia/v0.4/CUDArt/src/device.jl:59
 in test1 at /***/test.jl:7

ikirill commented 9 years ago

If I surround to_host in gc_disable, gc_enable, there is also no crash, so this probably isn't to do with finalizers.

timholy commented 9 years ago

Very strange. I get this:

julia> CUDArt.cuda_ptrs
Dict{Any,Int64} with 0 entries

because this code should remove all the entries from the dict. Why isn't that happening in your case?

ikirill commented 9 years ago

The message gets printed before the error in to_host, so that's cuda_ptrs just before the crash.

timholy commented 9 years ago

Ah, I misunderstood when you were running it. That's informative too (but also worth checking: you should have an empty dict right after you successfully complete test1()).

I can't see anything that looks wrong with that output. I'm pretty baffled overall. Does Pkg.test("CUDArt") pass for you? What does nvcc --version say for you? How about the deviceQuery test?

ikirill commented 9 years ago

julia> Pkg.test("CUDArt")
INFO: Testing CUDArt
juliarc = "/***/.juliarc.jl"
INFO: CUDArt tests passed
INFO: No packages to install, update or remove

~ $ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2013 NVIDIA Corporation
Built on Thu_Mar_13_11:58:58_PDT_2014
Cuda compilation tools, release 6.0, V6.0.1

~ $ ./deviceQuery

Detected 4 CUDA Capable device(s)

Device 0: "Tesla M2090"
  CUDA Driver Version / Runtime Version          6.0 / 6.0
  CUDA Capability Major/Minor version number:    2.0
<snip...>

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA Runtime Version = 6.0, NumDevs = 4, Device0 = Tesla M2090, Device1 = Tesla M2090, Device2 = Tesla M2090, Device3 = Tesla M2090
Result = PASS

I should point out that I don't have direct control over the machine here, it's more like a departmental server that I can use.

I notice the runtime version is 6.0, while the wrappers were generated for 6.5. I think this is probably not the reason because the actual API is probably just the same.

At first I thought maybe it's some pointer alignment issue (the host pointer is not divisible by 128 = 0x80), but the other host pointers are also not aligned and do not cause errors.

ikirill commented 9 years ago

In that example, the device is still initialized (i.e. cudaDeviceReset was not called, which is a real problem):

function isInitialized(dev)
  device(dev)
  try
    CUDArt.rt.cudaSetDeviceFlags(0)
    return false
  catch ex
    @show ex
    return true
  end
end

julia> Test.isInitialized(0)
WARNING: CUDA error triggered from:

 in checkerror at /***/.julia/v0.4/CUDArt/src/libcudart-6.5.jl:15
 in isInitialized at /***/test.jl:8ex = "cannot set while device is active in this process"
true

JuliaAttic / CUDArt.jl

Spurious failures in cudacopy! with "invalid argument" error #18