SpikingNetwork / TrainSpikingNet.jl

train a spiking recurrent neural network
BSD 3-Clause "New" or "Revised" License
14 stars 4 forks source link

Failing CUDA testing with Nvidia GPU (user error). #2

Closed russelljjarvis closed 1 year ago

russelljjarvis commented 1 year ago

(base) rjjarvis@pop-os:~/git/TrainSpikingNet.jl/test$ julia --project=@. runtests.jl Array ERROR: LoadError: CUDA error (code 804, CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE) Stacktrace: [1] throw_api_error(res::CUDA.cudaError_enum) @ CUDA ~/.julia/packages/CUDA/BbliS/lib/cudadrv/error.jl:89 [2] macro expansion @ ~/.julia/packages/CUDA/BbliS/lib/cudadrv/error.jl:97 [inlined] [3] cuInit @ ~/.julia/packages/CUDA/BbliS/lib/utils/call.jl:26 [inlined] [4] __init_driver__() @ CUDA ~/.julia/packages/CUDA/BbliS/src/initialization.jl:84 [5] libcuda() @ CUDA ~/.julia/packages/CUDA/BbliS/lib/cudadrv/CUDAdrv.jl:142 [6] macro expansion @ ~/.julia/packages/CUDA/BbliS/lib/cudadrv/libcuda.jl:29 [inlined] [7] macro expansion @ ~/.julia/packages/CUDA/BbliS/lib/cudadrv/error.jl:95 [inlined] [8] cuDeviceGet @ ~/.julia/packages/CUDA/BbliS/lib/utils/call.jl:26 [inlined] [9] CuDevice @ ~/.julia/packages/CUDA/BbliS/lib/cudadrv/devices.jl:17 [inlined] [10] CUDA.TaskLocalState() @ CUDA ~/.julia/packages/CUDA/BbliS/lib/cudadrv/state.jl:50 [11] task_local_state!() @ CUDA ~/.julia/packages/CUDA/BbliS/lib/cudadrv/state.jl:73 [12] active_state @ ~/.julia/packages/CUDA/BbliS/lib/cudadrv/state.jl:106 [inlined] [13] #_alloc#174 @ ~/.julia/packages/CUDA/BbliS/src/pool.jl:400 [inlined] [14] #alloc#173 @ ~/.julia/packages/CUDA/BbliS/src/pool.jl:389 [inlined] [15] alloc @ ~/.julia/packages/CUDA/BbliS/src/pool.jl:383 [inlined] [16] CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}(#unused#::UndefInitializer, dims::Tuple{Int64}) @ CUDA ~/.julia/packages/CUDA/BbliS/src/array.jl:42 [17] CuArray @ ~/.julia/packages/CUDA/BbliS/src/array.jl:291 [inlined] [18] CuArray @ ~/.julia/packages/CUDA/BbliS/src/array.jl:296 [inlined] [19] (CuArray{Float64})(xs::Vector{Float64}) @ CUDA ~/.julia/packages/CUDA/BbliS/src/array.jl:303 [20] top-level scope @ ~/git/TrainSpikingNet.jl/src/gpu/variables.jl:3 [21] include(fname::String) @ Base.MainInclude ./client.jl:476 [22] top-level scope @ ~/git/TrainSpikingNet.jl/src/gpu/train.jl:90 in expression starting at /home/rjjarvis/git/TrainSpikingNet.jl/src/gpu/variables.jl:3 in expression starting at /home/rjjarvis/git/TrainSpikingNet.jl/src/gpu/train.jl:90 Array: Error During Test at /home/rjjarvis/git/TrainSpikingNet.jl/test/runtests.jl:61 Got exception outside of a @test failed process: Process(/home/rjjarvis/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/bin/julia -Cnative -J/home/rjjarvis/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/lib/julia/sys.so -g1 /home/rjjarvis/git/TrainSpikingNet.jl/test/../src/gpu/train.jl --nloops 1 --correlation_interval 1 /home/rjjarvis/git/TrainSpikingNet.jl/test/scratch/gpu-Array, ProcessExited(1)) [1]

Stacktrace: [1] pipeline_error @ ./process.jl:565 [inlined] [2] (::Base.var"#732#733"{Base.Process})() @ Base ./process.jl:342 [3] iterate(itr::Base.EachLine{Base.PipeEndpoint}, state::Nothing) @ Base ./io.jl:1062 [4] _collect(cont::UnitRange{Int64}, itr::Base.EachLine{Base.PipeEndpoint}, #unused#::Base.HasEltype, isz::Base.SizeUnknown) @ Base ./array.jl:725 [5] collect @ ./array.jl:712 [inlined] [6] #readlines#401 @ ./io.jl:588 [inlined] [7] readlines @ ./io.jl:588 [inlined] [8] compare_cpu_to_gpu(kind::String; ntasks::Int64, nloops::Int64, cinterval::Int64, spikerate::Bool, Pmatrix::Bool, weights::Bool, correlation::Nothing) @ Main ~/git/TrainSpikingNet.jl/test/runtests.jl:22 [9] compare_cpu_to_gpu(kind::String) @ Main ~/git/TrainSpikingNet.jl/test/runtests.jl:3 [10] macro expansion @ ~/git/TrainSpikingNet.jl/test/runtests.jl:75 [inlined] [11] top-level scope @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Test/src/Test.jl:1439 [12] include(mod::Module, _path::String) @ Base ./Base.jl:419 [13] exec_options(opts::Base.JLOptions) @ Base ./client.jl:303 [14] _start() @ Base ./client.jl:522 Test Summary: | Pass Error Total Time Array | 2 1 3 51.2s Test Summary: | Pass Error Total Time Array | 2 1 3 51.3s ERROR: LoadError: Some tests did not pass: 2 passed, 0 failed, 1 errored, 0 broken. in expression starting at /home/rjjarvis/git/TrainSpikingNet.jl/test/runtests.jl:61

caused by: Some tests did not pass: 2 passed, 0 failed, 1 errored, 0 broken. (base) rjjarvis@pop-os:~/git/TrainSpikingNet.jl/test$

russelljjarvis commented 1 year ago

I often have to do

rm -r scratch

Between re-running of unit tests. I wonder if there is a way to make Julia clean out the scratch directories on a restart of unit testing?

mkitti commented 1 year ago

There is https://github.com/JuliaPackaging/Scratch.jl

bjarthur commented 1 year ago

do CUDA.jl's tests pass? that is, if you cd into the trainspikingnet directory, and then from the julia REPL do ] test CUDA, are any errors thrown?

and as always with any bug report, it's helpful to know what OS you're using, what version of julia, which graphics card, etc.

mkitti commented 1 year ago

if you cd into the trainspikingnet directory

Did you mean to activate a project in that directory?

russelljjarvis commented 1 year ago

do CUDA.jl's tests pass? that is, if you cd into the trainspikingnet directory, and then from the julia REPL do ] test CUDA, are any errors thrown?

and as always with any bug report, it's helpful to know what OS you're using, what version of julia, which graphics card, etc.

I did do some of Pkg.test("CUDA") or ] test CUDA, but unfortunately I quit before all the tests were completed, I will redo that today and put the update here. I have two Nvidia-compliant Linux environments I can test on.

russelljjarvis commented 1 year ago

if you cd into the trainspikingnet directory

Did you mean to activate a project in that directory?

I believe I followed the README.md instructions faithfully which included CC'ing in to TrainingSpikingNet, I will start from scratch again just in case it is human error.

russelljjarvis commented 1 year ago

Okay it was probably just a human error, sorry.

I did the recommended ] test CUDA, which essentially passed.

Test Summary: | Pass Broken Total Time Overall | 17020 5 17025
SUCCESS Testing CUDA tests passed

and then followed the readme instructions

I believe I departed from the instructions the first time around by substituting git clone depth 1 for a simple git clone.

(base) rjjarvis@pop-os:~/build_testing/depth1/TrainSpikingNet.jl/test$ julia --project=@. runtests.jl
  Downloaded artifact: CUDA
Test Summary: | Pass  Total     Time
Array         |    6      6  2m27.5s
Test Summary: | Pass  Total     Time
Symmetric     |    7      7  1m28.7s
Test Summary:   | Pass  Total     Time
SymmetricPacked |    7      7  1m27.9s
Test Summary:      | Pass  Total     Time
pree=0.1, sig=0.65 |    6      6  1m29.2s
Test Summary:     | Pass  Total     Time
pree=0.1, sig=0.0 |    6      6  1m27.4s
Test Summary:      | Pass  Total     Time
pree=0.0, sig=0.65 |    6      6  1m30.7s
Test Summary:     | Pass  Total     Time
pree=0.0, sig=0.0 |    6      6  1m29.0s
Test Summary:       | Pass  Total     Time
voltage noise model |    6      6  1m29.0s
Test Summary: | Pass  Total   Time
Ricciardi     |    7      7  22.0s
Test Summary: | Pass  Total     Time
Int16         |    3      3  1m15.1s
Test Summary: | Pass  Total     Time
feed forward  |    6      6  1m33.2s
Test Summary:  | Pass  Total     Time
multiple tasks |    6      6  1m27.7s
Test Summary: | Pass  Total     Time
test          |    4      4  2m49.3s
Test Summary: | Pass  Total     Time
learns        |    2      2  6m02.4s
Test Summary: | Pass  Total     Time
GLIF1         |    2      2  1m28.9s
Test Summary: | Pass  Total     Time
GLIF2         |    2      2  1m30.5s
Test Summary: | Pass  Total     Time
GLIF3         |    2      2  1m39.8s
Test Summary: | Pass  Total     Time
GLIF4         |    2      2  1m42.1s
Test Summary: | Pass  Total     Time
GLIF5         |    2      2  1m40.9s
bjarthur commented 1 year ago

glad it works, and good to see you've upgraded to the latest version of TrainSpikingNet too!

russelljjarvis commented 1 year ago

There is https://github.com/JuliaPackaging/Scratch.jl

Do you think it would be easy to implement here?

bjarthur commented 1 year ago

a new commit now deletes the scratch folder when testing

russelljjarvis commented 1 year ago

That is really awesome too!