Closed AaronGhost closed 9 months ago
Using https://docs.julialang.org/en/v1/stdlib/Profile/#Triggered-During-Execution
It looks like we get stuck on entering GC because cuOccupancyMaxPotentialBlockSize
is blocking
unknown function (ip: 0x7ffae4162444)
__pthread_rwlock_wrlock at /usr/lib/libc.so.6 (unknown line)
unknown function (ip: 0x7ffa52104347)
unknown function (ip: 0x7ffa51f14b34)
macro expansion at /home/vchuravy/.julia/packages/CUDA/6Jmwc/lib/cudadrv/libcuda.jl:4848 [inlined]
#705 at /home/vchuravy/.julia/packages/CUDA/6Jmwc/lib/utils/call.jl:27
check at /home/vchuravy/.julia/packages/CUDA/6Jmwc/lib/cudadrv/libcuda.jl:32 [inlined]
cuOccupancyMaxPotentialBlockSize at /home/vchuravy/.julia/packages/CUDA/6Jmwc/lib/utils/call.jl:26 [inlined]
#launch_configuration#901 at /home/vchuravy/.julia/packages/CUDA/6Jmwc/lib/cudadrv/occupancy.jl:59 [inlined]
launch_configuration at /home/vchuravy/.julia/packages/CUDA/6Jmwc/lib/cudadrv/occupancy.jl:54 [inlined]
#launch_heuristic#1126 at /home/vchuravy/.julia/packages/CUDA/6Jmwc/src/gpuarrays.jl:22 [inlined]
launch_heuristic at /home/vchuravy/.julia/packages/CUDA/6Jmwc/src/gpuarrays.jl:15 [inlined]
_copyto! at /home/vchuravy/.julia/packages/GPUArrays/Hd5Sk/src/host/broadcast.jl:56 [inlined]
materialize! at /home/vchuravy/.julia/packages/GPUArrays/Hd5Sk/src/host/broadcast.jl:32 [inlined]
materialize! at ./broadcast.jl:911 [inlined]
macro expansion at ./REPL[4]:7 [inlined]
#2#threadsfor_fun#1 at ./threadingconstructs.jl:214
#2#threadsfor_fun at ./threadingconstructs.jl:181 [inlined]
#1 at ./threadingconstructs.jl:153
unknown function (ip: 0x7ffaccf94692)
_jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
start_task at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:1238
unknown function (ip: (nil))
unknown function (ip: 0x7ffae41624ac)
pthread_cond_wait at /usr/lib/libc.so.6 (unknown line)
uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883
jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173
jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350
unknown function (ip: 0x7ffae411770f)
toInt64 at ./boot.jl:703 [inlined]
Int64 at ./boot.jl:784 [inlined]
convert at ./number.jl:7 [inlined]
_promote at ./promotion.jl:370 [inlined]
promote at ./promotion.jl:393 [inlined]
< at ./promotion.jl:462 [inlined]
> at ./operators.jl:378 [inlined]
compute_threads at /home/vchuravy/.julia/packages/CUDA/6Jmwc/src/mapreduce.jl:222 [inlined]
call_composed at ./operators.jl:1045 [inlined]
call_composed at ./operators.jl:1044 [inlined]
#_#103 at ./operators.jl:1041 [inlined]
ComposedFunction at ./operators.jl:1041 [inlined]
#902 at /home/vchuravy/.julia/packages/CUDA/6Jmwc/lib/cudadrv/occupancy.jl:61
unknown function (ip: (nil))
Could you test https://github.com/JuliaGPU/CUDA.jl/pull/2262 and see if it fixes your issue?
Thanks for looking into it! I checked out the branch locally, deved and it still deadlocks on Windows. I am happy to run some diagnostics to track it further, but not really sure what commands I need to run (The profiler can't be triggered during execution on Windows if I understood correctly and the @profile
never returns due to the deadlock, unless I am missing something?).
Yeah windows makes that harder, if you can somehow get a backtrack for all threads that would help immensely.
I could reproduce the hang on Linux before, but can't anymore. Maybe you could try WSL?
I managed to reproduce a deadlock with WSL. I ran the program with 4 threads this time. The first iteration of main
completes but the deadlock happens on the second iteration. I then used the signal method to get the backtrace. The backtrace is below. Let me know if I can do anything else to help!
``` signal (10): User defined signal 1 pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173 jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350 _IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) ijl_process_events at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/jl_uv.c:277 ijl_task_get_next at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:524 poptask at ./task.jl:985 wait at ./task.jl:994 #wait#645 at ./condition.jl:130 wait at ./condition.jl:125 [inlined] take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:53 synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120 unknown function (ip: 0x7f51ba9eafd8) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jlcapi_synchronization_worker_13802 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line) start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) unknown function (ip: (nil)) pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 ijl_task_get_next at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:509 poptask at ./task.jl:985 wait at ./task.jl:994 #wait#645 at ./condition.jl:130 wait at ./condition.jl:125 [inlined] take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:53 synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120 unknown function (ip: 0x7f51ba9eafd8) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jlcapi_synchronization_worker_13802 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line) start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) unknown function (ip: (nil)) pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173 jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350 _IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) _jl_mutex_unlock at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/threading.c:927 jl_mutex_unlock at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_locks.h:80 [inlined] ijl_process_events at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/jl_uv.c:286 ijl_task_get_next at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:524 poptask at ./task.jl:985 wait at ./task.jl:994 #wait#645 at ./condition.jl:130 wait at ./condition.jl:125 [inlined] take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:53 synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120 unknown function (ip: 0x7f51ba9eafd8) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jlcapi_synchronization_worker_13802 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line) start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) unknown function (ip: (nil)) pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 ijl_task_get_next at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:509 poptask at ./task.jl:985 wait at ./task.jl:994 #wait#645 at ./condition.jl:130 wait at ./condition.jl:125 [inlined] take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:53 synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120 unknown function (ip: 0x7f51ba9eafd8) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jlcapi_synchronization_worker_13802 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line) start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) unknown function (ip: (nil)) _mm_pause at /usr/local/lib/gcc/x86_64-linux-gnu/9.1.0/include/xmmintrin.h:1271 [inlined] jl_gc_wait_for_the_world at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:242 [inlined] ijl_gc_collect at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:3502 maybe_collect at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:937 [inlined] jl_gc_pool_alloc_inner at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:1293 [inlined] jl_gc_pool_alloc_noinline at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:1350 jl_gc_alloc_ at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:477 [inlined] _new_array_ at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/array.c:144 [inlined] _new_array at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/array.c:198 [inlined] ijl_alloc_array_3d at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/array.c:450 Array at ./boot.jl:481 [inlined] Array at ./boot.jl:488 [inlined] similar at ./array.jl:420 [inlined] similar at ./abstractarray.jl:828 [inlined] _unsafe_getindex at ./multidimensional.jl:901 _getindex at ./multidimensional.jl:889 [inlined] getindex at ./abstractarray.jl:1288 [inlined] macro expansion at /mnt/d/Documents/Julia/deadlock.jl:9 [inlined] #2#threadsfor_fun#1 at ./threadingconstructs.jl:214 #2#threadsfor_fun at ./threadingconstructs.jl:181 [inlined] #1 at ./threadingconstructs.jl:153 unknown function (ip: 0x7f51ba9d6bc2) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jl_apply at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined] start_task at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:1238 unknown function (ip: (nil)) pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173 jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350 _IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) task_local_state! at /path/to/local/CUDA.jl/lib/cudadrv/state.jl:69 unknown function (ip: (nil)) pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173 jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350 _IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) unknown function (ip: 0x7f51ba9ca1fc) unknown function (ip: (nil)) pthread_rwlock_wrlock at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) unknown function (ip: 0x7f515f244454) unknown function (ip: 0x7f515ef80536) unknown function (ip: 0x7f515ef81233) unknown function (ip: 0x7f515ef82eae) unknown function (ip: 0x7f515f067f34) macro expansion at /path/to/local/CUDA.jl/lib/cudadrv/libcuda.jl:356 [inlined] #49 at /path/to/local/CUDA.jl/lib/utils/call.jl:27 [inlined] check at /path/to/local/CUDA.jl/lib/cudadrv/libcuda.jl:32 [inlined] cuMemcpyDtoHAsync_v2 at /path/to/local/CUDA.jl/lib/utils/call.jl:26 [inlined] #unsafe_copyto!#8 at /path/to/local/CUDA.jl/lib/cudadrv/memory.jl:397 [inlined] unsafe_copyto! at /path/to/local/CUDA.jl/lib/cudadrv/memory.jl:394 #1055 at /path/to/local/CUDA.jlsrc/array.jl:610 #context!#913 at /path/to/local/CUDA.jl/lib/cudadrv/state.jl:170 [inlined] context! at /path/to/local/CUDA.jl/lib/cudadrv/state.jl:165 [inlined] unsafe_copyto! at /path/to/local/CUDA.jlsrc/array.jl:602 copyto! at /path/to/local/CUDA.jlsrc/array.jl:555 [inlined] getindex at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/indexing.jl:50 scalar_getindex at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/indexing.jl:34 [inlined] _getindex at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/indexing.jl:17 [inlined] getindex at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/indexing.jl:15 [inlined] macro expansion at /home/username/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:210 [inlined] #_mapreduce#43 at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/mapreduce.jl:71 _mapreduce at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/mapreduce.jl:33 [inlined] #mapreduce#41 at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/mapreduce.jl:28 [inlined] mapreduce at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/mapreduce.jl:28 [inlined] #_sum#831 at ./reducedim.jl:1015 [inlined] _sum at ./reducedim.jl:1015 [inlined] #_sum#830 at ./reducedim.jl:1014 [inlined] _sum at ./reducedim.jl:1014 [inlined] #sum#828 at ./reducedim.jl:1010 [inlined] sum at ./reducedim.jl:1010 [inlined] macro expansion at /mnt/d/Documents/Julia/deadlock.jl:9 [inlined] #2#threadsfor_fun#1 at ./threadingconstructs.jl:214 #2#threadsfor_fun at ./threadingconstructs.jl:181 [inlined] #1 at ./threadingconstructs.jl:153 unknown function (ip: 0x7f51ba9d6bc2) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jl_apply at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined] start_task at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:1238 unknown function (ip: (nil)) ```
``` Overhead ╎ [+additional indent] Count File:Line; Function ========================================================= Thread 1 Task 0x00007f50e87044c0 Total snapshots: 385. Utilization: 100% ╎385 @Base/threadingconstructs.jl:153; (::Base.Threads.var"#1#2"{var"#2#threadsfor_fun#2"{var"#2#threadsfor_fun#1#3"{CuArray{ComplexF32, 4, CUDA.Mem.DeviceBuffer}, Array{ComplexF32, 5}, Base.OneTo{Int64}}}, Int64})() ╎ 385 @Base/threadingconstructs.jl:181; #2#threadsfor_fun ╎ 385 @Base/threadingconstructs.jl:214; (::var"#2#threadsfor_fun#2"{var"#2#threadsfor_fun#1#3"{CuArray{ComplexF32, 4, CUDA.Mem.DeviceBuffer}, Array{ComplexF32, 5}, Base.OneTo{Int64}}})(tid::Int64; onethread::Bool) ╎ 385 /mnt/d/Documents/Julia/deadlock.jl:9; macro expansion ╎ 385 @Base/reducedim.jl:1010; sum ╎ 385 @Base/reducedim.jl:1010; #sum#828 ╎ ╎ 385 @Base/reducedim.jl:1014; _sum ╎ ╎ 385 @Base/reducedim.jl:1014; #_sum#830 ╎ ╎ 385 @Base/reducedim.jl:1015; _sum ╎ ╎ 385 @Base/reducedim.jl:1015; #_sum#831 ╎ ╎ 385 @GPUArrays/src/host/mapreduce.jl:28; mapreduce ╎ ╎ ╎ 385 @GPUArrays/src/host/mapreduce.jl:28; #mapreduce#41 ╎ ╎ ╎ 385 @GPUArrays/src/host/mapreduce.jl:33; _mapreduce ╎ ╎ ╎ 385 @GPUArrays/src/host/mapreduce.jl:71; _mapreduce(f::typeof(identity), op::typeof(Base.add_sum), As::CuArray{ComplexF32, 3, CUDA.Mem.DeviceBuffer}; dims::Colon, init::Nothing) ╎ ╎ ╎ 385 @GPUArraysCore/src/GPUArraysCore.jl:210; macro expansion ╎ ╎ ╎ 385 @GPUArrays/src/host/indexing.jl:15; getindex(A::CuArray{ComplexF32, 3, CUDA.Mem.DeviceBuffer}, I::Int64) ╎ ╎ ╎ ╎ 385 @GPUArrays/src/host/indexing.jl:17; _getindex ╎ ╎ ╎ ╎ 385 @GPUArrays/src/host/indexing.jl:34; scalar_getindex ╎ ╎ ╎ ╎ 385 @GPUArrays/src/host/indexing.jl:50; getindex(A::CuArray{ComplexF32, 3, CUDA.Mem.DeviceBuffer}, I::Int64) ╎ ╎ ╎ ╎ 385 @CUDA/src/array.jl:555; copyto! ╎ ╎ ╎ ╎ 385 @CUDA/src/array.jl:602; unsafe_copyto!(dest::Vector{ComplexF32}, doffs::Int64, src::CuArray{ComplexF32, 3, CUDA.Mem.DeviceBuffer}, soffs::Int64, n::Int64) ╎ ╎ ╎ ╎ ╎ 385 @CUDA/lib/cudadrv/state.jl:165; context!(ctx::CuContext) ╎ ╎ ╎ ╎ ╎ 385 @CUDA/lib/cudadrv/state.jl:170; #context!#913 ╎ ╎ ╎ ╎ ╎ 385 @CUDA/src/array.jl:610; (::CUDA.var"#1055#1056"{ComplexF32, Vector{ComplexF32}, Int64, CuArray{ComplexF32, 3, CUDA.Mem.DeviceBuffer}, Int64, Int64})() ╎ ╎ ╎ ╎ ╎ 385 @CUDA/lib/cudadrv/memory.jl:394; kwcall(::@NamedTuple{async::Bool}, ::typeof(unsafe_copyto!), dst::Ptr{ComplexF32}, src::CuPtr{ComplexF32}, N::Int64) ╎ ╎ ╎ ╎ ╎ 385 @CUDA/lib/cudadrv/memory.jl:397; #unsafe_copyto!#8 ╎ ╎ ╎ ╎ ╎ ╎ 385 @CUDA/lib/utils/call.jl:26; cuMemcpyDtoHAsync_v2 ╎ ╎ ╎ ╎ ╎ ╎ 385 @CUDA/lib/cudadrv/libcuda.jl:32; check ╎ ╎ ╎ ╎ ╎ ╎ 385 @CUDA/lib/utils/call.jl:27; #49 384╎ ╎ ╎ ╎ ╎ ╎ 385 @CUDA/lib/cudadrv/libcuda.jl:356; macro expansion Thread 2 Task 0x00007f50e87041a0 Total snapshots: 385. Utilization: 100% Thread 3 Task 0x00007f50e8704330 Total snapshots: 385. Utilization: 100% 384╎385 @CUDA/lib/cudadrv/state.jl:69; task_local_state!() Thread 4 Task 0x00007f50e8704650 Total snapshots: 385. Utilization: 100% ╎385 @Base/threadingconstructs.jl:153; (::Base.Threads.var"#1#2"{var"#2#threadsfor_fun#2"{var"#2#threadsfor_fun#1#3"{CuArray{ComplexF32, 4, CUDA.Mem.DeviceBuffer}, Array{ComplexF32, 5}, Base.OneTo{Int64}}}, Int64})() ╎ 385 @Base/threadingconstructs.jl:181; #2#threadsfor_fun ╎ 385 @Base/threadingconstructs.jl:214; (::var"#2#threadsfor_fun#2"{var"#2#threadsfor_fun#1#3"{CuArray{ComplexF32, 4, CUDA.Mem.DeviceBuffer}, Array{ComplexF32, 5}, Base.OneTo{Int64}}})(tid::Int64; onethread::Bool) ╎ 385 /mnt/d/Documents/Julia/deadlock.jl:9; macro expansion ╎ 385 @Base/abstractarray.jl:1288; getindex ╎ 385 @Base/multidimensional.jl:889; _getindex ╎ ╎ 385 @Base/multidimensional.jl:901; _unsafe_getindex(::IndexLinear, ::Array{ComplexF32, 5}, ::Base.Slice{Base.OneTo{Int64}}, ::Base.Slice{Base.OneTo{Int64}}, ::Base.Slice{Base.OneTo{Int64}}, ::Int64, ::Int64) ╎ ╎ 385 @Base/abstractarray.jl:828; similar ╎ ╎ 385 @Base/array.jl:420; similar ╎ ╎ 385 @Base/boot.jl:488; Array 384╎ ╎ 385 @Base/boot.jl:481; Array Thread 6 Task 0x00007f50c703c010 Total snapshots: 385. Utilization: 0% ╎385 @CUDA/lib/cudadrv/synchronization.jl:120; synchronization_worker(data::Ptr{Nothing}) ╎ 385 @CUDA/lib/cudadrv/synchronization.jl:53; take!(f::CUDA.var"#921#926", c::CUDA.BidirectionalChannel{Union{CuContext, CuEvent, CuStream}, CUDA.cudaError_enum}) ╎ 385 @Base/condition.jl:125; wait ╎ 385 @Base/condition.jl:130; wait(c::Base.GenericCondition{ReentrantLock}; first::Bool) ╎ 385 @Base/task.jl:994; wait() 384╎ 385 @Base/task.jl:985; poptask(W::Base.IntrusiveLinkedListSynchronized{Task}) Thread 7 Task 0x00007f50c7004010 Total snapshots: 385. Utilization: 100% ╎385 @CUDA/lib/cudadrv/synchronization.jl:120; synchronization_worker(data::Ptr{Nothing}) ╎ 385 @CUDA/lib/cudadrv/synchronization.jl:53; take!(f::CUDA.var"#921#926", c::CUDA.BidirectionalChannel{Union{CuContext, CuEvent, CuStream}, CUDA.cudaError_enum}) ╎ 385 @Base/condition.jl:125; wait ╎ 385 @Base/condition.jl:130; wait(c::Base.GenericCondition{ReentrantLock}; first::Bool) ╎ 385 @Base/task.jl:994; wait() 384╎ 385 @Base/task.jl:985; poptask(W::Base.IntrusiveLinkedListSynchronized{Task}) Thread 8 Task 0x00007f50c7000010 Total snapshots: 385. Utilization: 0% ╎385 @CUDA/lib/cudadrv/synchronization.jl:120; synchronization_worker(data::Ptr{Nothing}) ╎ 385 @CUDA/lib/cudadrv/synchronization.jl:53; take!(f::CUDA.var"#921#926", c::CUDA.BidirectionalChannel{Union{CuContext, CuEvent, CuStream}, CUDA.cudaError_enum}) ╎ 385 @Base/condition.jl:125; wait ╎ 385 @Base/condition.jl:130; wait(c::Base.GenericCondition{ReentrantLock}; first::Bool) ╎ 385 @Base/task.jl:994; wait() 384╎ 385 @Base/task.jl:985; poptask(W::Base.IntrusiveLinkedListSynchronized{Task}) Thread 9 Task 0x00007f50c6ff8010 Total snapshots: 385. Utilization: 100% ╎385 @CUDA/lib/cudadrv/synchronization.jl:120; synchronization_worker(data::Ptr{Nothing}) ╎ 385 @CUDA/lib/cudadrv/synchronization.jl:53; take!(f::CUDA.var"#921#926", c::CUDA.BidirectionalChannel{Union{CuContext, CuEvent, CuStream}, CUDA.cudaError_enum}) ╎ 385 @Base/condition.jl:125; wait ╎ 385 @Base/condition.jl:130; wait(c::Base.GenericCondition{ReentrantLock}; first::Bool) ╎ 385 @Base/task.jl:994; wait() 384╎ 385 @Base/task.jl:985; poptask(W::Base.IntrusiveLinkedListSynchronized{Task}) ```
Hm but you were able to collect a profile. That means it didn't fully hang at that point. In the other case I never got a collected profile since we never hit a yield point.
I tried the experience multiple times:
ctrl + C
/ SIGINT. In most case I get:
==============================================================
Profile collected. A report will print at the next yield point
==============================================================
^C^C^C^C^C^CWARNING: Force throwing a SIGINT Segmentation fault
And I never get access to the profile. I added below one of the reports I get in this case:
<details><summary>Backtrace</summary>
<p>
signal (10): User defined signal 1 pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 ijl_task_get_next at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:509 poptask at ./task.jl:985 wait at ./task.jl:994
wait at ./condition.jl:125 [inlined] take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:53 synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120 unknown function (ip: 0x7f849a3a0808) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jlcapi_synchronization_worker_13802 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line) start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) unknown function (ip: (nil)) pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 ijl_task_get_next at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:509 poptask at ./task.jl:985 wait at ./task.jl:994
wait at ./condition.jl:125 [inlined] take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:53 synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120 unknown function (ip: 0x7f849a3a0808) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jlcapi_synchronization_worker_13802 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line) start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) unknown function (ip: (nil)) pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 ijl_task_get_next at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:509 poptask at ./task.jl:985 wait at ./task.jl:994
wait at ./condition.jl:125 [inlined] take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:53 synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120 unknown function (ip: 0x7f849a3a0808) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jlcapi_synchronization_worker_13802 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line) start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) unknown function (ip: (nil)) pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173 jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350 _IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) ijl_gc_safepoint at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/jlapi.c:472 ijl_task_get_next at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:472 poptask at ./task.jl:985 wait at ./task.jl:994
wait at ./condition.jl:125 [inlined] take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:53 synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120 unknown function (ip: 0x7f849a3a0808) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jlcapi_synchronization_worker_13802 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line) start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) unknown function (ip: (nil)) pthread_rwlock_wrlock at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) unknown function (ip: 0x7f844007231b) unknown function (ip: 0x7f843fd7da08) unknown function (ip: 0x7f843fd7e5eb) unknown function (ip: 0x7f843fe91946) macro expansion at /path/to/local/CUDA.jl/lib/cudadrv/libcuda.jl:4165 [inlined]
check at /path/to/local/CUDA.jl/lib/cudadrv/libcuda.jl:32 [inlined] cuLaunchKernel at /path/to/local/CUDA.jl/lib/utils/call.jl:26 [inlined]
macro expansion at /path/to/local/CUDA.jl/lib/cudadrv/execution.jl:33 [inlined] macro expansion at ./none:0 [inlined] pack_arguments at ./none:0
launch at /path/to/local/CUDA.jl/lib/cudadrv/execution.jl:52 [inlined]
macro expansion at /path/to/local/CUDA.jl/lib/cudadrv/execution.jl:135 [inlined] macro expansion at ./none:0 [inlined] convert_arguments at ./none:0 [inlined]
cudacall at /path/to/local/CUDA.jl/lib/cudadrv/execution.jl:173 [inlined] macro expansion at /path/to/local/CUDA.jl/src/compiler/execution.jl:266 [inlined] macro expansion at ./none:0 [inlined]
unknown function (ip: 0x7f849a39e685) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 call at ./none:0 [inlined]
HostKernel at /path/to/local/CUDA.jl/src/compiler/execution.jl:388 [inlined] macro expansion at /path/to/local/CUDA.jl/src/compiler/execution.jl:114 [inlined]
mapreducedim! at /path/to/local/CUDA.jl/src/mapreduce.jl:169 [inlined]
_mapreduce at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/mapreduce.jl:33 [inlined]
mapreduce at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/mapreduce.jl:28 [inlined]
_sum at ./reducedim.jl:1015 [inlined]
_sum at ./reducedim.jl:1014 [inlined]
sum at ./reducedim.jl:1010 [inlined] macro expansion at /mnt/d/Documents/Julia/deadlock.jl:9 [inlined]
unknown function (ip: 0x7f849a3bef52) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jl_apply at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined] start_task at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:1238 unknown function (ip: (nil)) _mm_pause at /usr/local/lib/gcc/x86_64-linux-gnu/9.1.0/include/xmmintrin.h:1271 [inlined] jl_gc_wait_for_the_world at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:242 [inlined] ijl_gc_collect at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:3502 maybe_collect at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:937 [inlined] jl_gc_pool_alloc_inner at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:1293 [inlined] jl_gc_pool_alloc_noinline at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:1350 jl_gcalloc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:477 [inlined] _newarray at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/array.c:144 [inlined] _new_array at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/array.c:198 [inlined] ijl_alloc_array_3d at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/array.c:450 Array at ./boot.jl:481 [inlined] Array at ./boot.jl:488 [inlined] similar at ./array.jl:420 [inlined] similar at ./abstractarray.jl:828 [inlined] _unsafe_getindex at ./multidimensional.jl:901 _getindex at ./multidimensional.jl:889 [inlined] getindex at ./abstractarray.jl:1288 [inlined] macro expansion at /mnt/d/Documents/Julia/deadlock.jl:9 [inlined]
unknown function (ip: 0x7f849a3bef52) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jl_apply at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined] start_task at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:1238 unknown function (ip: (nil)) pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173 jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350 _IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) unknown function (ip: 0x7f849a38798c) unknown function (ip: (nil)) pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173 jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350 _IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) copy at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/abstractarray.jl:75 unknown function (ip: (nil))
</p></details>
In a few cases with WSL, I manage to get a profile out by continuing to send interruption signals. I assume I manage to interrupt a function in particular but really not sure what is going on here, I get something out once in 10 tries I would say.
```julia
==============================================================
Profile collected. A report will print at the next yield point
==============================================================
^C^C^C^C^C^C^C^CWARNING: Force throwing a SIGINT
ERROR: LoadError: InterruptException:
Stacktrace:
[1] try_yieldto(undo::typeof(Base.ensure_rescheduled))
@ Base ./task.jl:931
[2] wait()
@ Base ./task.jl:995
[3] wait(c::Base.GenericCondition{Base.Threads.SpinLock}; first::Bool)
@ Base ./condition.jl:130
[4] wait
@ Base ./condition.jl:125 [inlined]
[5] _wait(t::Task)
@ Base ./task.jl:310
[6] ^Cthreading_run(fun::var"#39#threadsfor_fun#8"{var"#39#threadsfor_fun#7#9"{CuArray{ComplexF32, 4, CUDA.Mem.DeviceBuffer}, Array{ComplexF32, 5}, Base.OneTo{Int64}}}, static::Bool)
@ Base.Threads ./threadingconstructs.jl:166
[7] macro expansion
@ ./threadingconstructs.jl:219 [inlined]
[8] main()
@ Main /path/to/test_deadlock.jl:7
[9] top-level scope
@ /path/to/test_deadlock.jl:15
[10] include(fname::String)
@ Base.MainInclude ./client.jl:489
[11] top-level scope
@ REPL[2]:1
[12] top-level scope
@ /path/to/local/CUDA.jl/src/initialization.jl:206
in expression starting at /path/to/test_deadlock.jl:15
Can you try the latest version of the PR (which marks all ccall
s as gc-safe)?
Thanks. I tried the latest version of the PR and can't make my MWE deadlock on WSL or Windows with julia 1.10 anymore. I tried the latest version to my original code which still deadlocks with 1.10 and finishes normally with 1.9.
I reduced the new version which is very similar to the previous one except for the FFT plan. I added the backtrace obtained from WSL with 8 threads and the new MWE:
using CUDA
using ChunkSplitters
function main()
data = rand(ComplexF32, (100, 100, 8, 20, 200))
cu_result = CUDA.zeros(ComplexF32, (100, 100, 20, 200))
plans = [CUDA.CUFFT.plan_bfft(CUDA.zeros(ComplexF32, (100, 100, 8)), 1:2) for _ in 1:Threads.nthreads()]
Threads.@threads for (ichunk, chunk) in enumerate(chunks(axes(data, 5); n=Threads.nthreads()))
for i in chunk
for t in axes(data, 4)
cu_result[:, :, t, i] .= sum(plans[ichunk] * CuArray(data[:, :, :, t, i]))
end
end
end
end
println(getpid())
for i in 1:5
println("Run $i")
main()
end
``` ====================================================================================== Information request received. A stacktrace will print followed by a 1.0 second profile ====================================================================================== cmd: /home/username/julia-1.10.0/bin/julia 776 running 15 of 15 signal (10): User defined signal 1 pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173 jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350 _IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) multiq_check_empty at ./partr.jl:186 jfptr_multiq_check_empty_75167.1 at /home/username/julia-1.10.0/lib/julia/sys.so (unknown line) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 check_empty at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:340 [inlined] ijl_task_get_next at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:388 poptask at ./task.jl:985 wait at ./task.jl:994 #wait#645 at ./condition.jl:130 wait at ./condition.jl:125 [inlined] take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:53 synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120 unknown function (ip: 0x7fcb27c66c08) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jlcapi_synchronization_worker_13946 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line) start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) unknown function (ip: (nil)) pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173 jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350 _IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) _jl_mutex_unlock at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/threading.c:927 jl_mutex_unlock at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_locks.h:80 [inlined] ijl_process_events at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/jl_uv.c:286 ijl_task_get_next at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:524 poptask at ./task.jl:985 wait at ./task.jl:994 #wait#645 at ./condition.jl:130 wait at ./condition.jl:125 [inlined] take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:53 synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120 unknown function (ip: 0x7fcb27c66c08) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jlcapi_synchronization_worker_13946 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line) start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) unknown function (ip: (nil)) pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173 jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350 _IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) jl_gc_state_set at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_threads.h:351 [inlined] jl_gc_state_set at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_threads.h:344 [inlined] jl_gc_safe_leave at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/jlapi.c:465macro expansion at /path/to/local/CUDA.jl/lib/utils/call.jl:204 [inlined] unchecked_cuStreamSynchronize at /path/to/local/CUDA.jl/lib/cudadrv/libcuda.jl:4023 [inlined] #920 at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:126 [inlined] take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:56 synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120 unknown function (ip: 0x7fcb27c66c08) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jlcapi_synchronization_worker_13946 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line) start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) unknown function (ip: (nil)) pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173 jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350 _IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) ijl_process_events at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/jl_uv.c:277 ijl_task_get_next at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:524 poptask at ./task.jl:985 wait at ./task.jl:994 #wait#645 at ./condition.jl:130 wait at ./condition.jl:125 [inlined] take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:53 synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120 unknown function (ip: 0x7fcb27c66c08) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jlcapi_synchronization_worker_13946 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line) start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) unknown function (ip: (nil)) pthread_rwlock_wrlock at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) unknown function (ip: 0x7fcaee2f131b) unknown function (ip: 0x7fcaedfe2021) unknown function (ip: 0x7fcaee0e4546) macro expansion at /path/to/local/CUDA.jl/lib/utils/call.jl:203 [inlined] macro expansion at /path/to/local/CUDA.jl/lib/cudadrv/libcuda.jl:4848 [inlined] #705 at /path/to/local/CUDA.jl/lib/utils/call.jl:30 check at /path/to/local/CUDA.jl/lib/cudadrv/libcuda.jl:32 [inlined] cuOccupancyMaxPotentialBlockSize at /path/to/local/CUDA.jl/lib/utils/call.jl:29 unknown function (ip: 0x7fcb27ca9802) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 #launch_configuration#901 at /path/to/local/CUDA.jl/lib/cudadrv/occupancy.jl:75 launch_configuration at /path/to/local/CUDA.jl/lib/cudadrv/occupancy.jl:60 [inlined] #mapreducedim!#1160 at /path/to/local/CUDA.jl/src/mapreduce.jl:236 mapreducedim! at /path/to/local/CUDA.jl/src/mapreduce.jl:169 [inlined] #mapreducedim!#1160 at /path/to/local/CUDA.jl/src/mapreduce.jl:274 mapreducedim! at /path/to/local/CUDA.jl/src/mapreduce.jl:169 [inlined] #_mapreduce#43 at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/mapreduce.jl:67 _mapreduce at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/mapreduce.jl:33 [inlined] #mapreduce#41 at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/mapreduce.jl:28 [inlined]mapreduce at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/mapreduce.jl:28 [inlined] #_sum#831 at ./reducedim.jl:1015 [inlined] _sum at ./reducedim.jl:1015 [inlined] #_sum#830 at ./reducedim.jl:1014 [inlined] _sum at ./reducedim.jl:1014 [inlined] #sum#828 at ./reducedim.jl:1010 [inlined] sum at ./reducedim.jl:1010 [inlined] macro expansion at /local/directory/deadlock.jl:12 [inlined] #2#threadsfor_fun#2 at ./threadingconstructs.jl:214 #2#threadsfor_fun at ./threadingconstructs.jl:181 [inlined] #1 at ./threadingconstructs.jl:153 unknown function (ip: 0x7fcb27c84562) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jl_apply at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined] start_task at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:1238 unknown function (ip: (nil)) pthread_rwlock_wrlock at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) unknown function (ip: 0x7fcaee2f131b) unknown function (ip: 0x7fcaedffca08) unknown function (ip: 0x7fcaedffd5eb) unknown function (ip: 0x7fcaee110946) unknown function (ip: 0x7fca7440b04c) unknown function (ip: 0x7fca744302de) unknown function (ip: 0x7fca7443a879) unknown function (ip: 0x7fca744272d9) unknown function (ip: 0x7fca7442b6bc) unknown function (ip: 0x7fca74455bda) unknown function (ip: 0x7fca74408fe9) cufftXtExecDescriptor at /home/username/.julia/artifacts/c0e6b8fff2621303ace1cc360b7fca676b4e28fd/lib/libcufft.so (unknown line) cufftXtExec at /home/username/.julia/artifacts/c0e6b8fff2621303ace1cc360b7fca676b4e28fd/lib/libcufft.so (unknown line) macro expansion at /path/to/local/CUDA.jl/lib/cufft/libcufft.jl:229 [inlined] #46 at /path/to/local/CUDA.jl/lib/utils/call.jl:30 [inlined] retry_reclaim at /path/to/local/CUDA.jl/src/pool.jl:370 [inlined] check at /path/to/local/CUDA.jl/lib/cufft/libcufft.jl:18 [inlined] cufftExecC2C at /path/to/local/CUDA.jl/lib/utils/call.jl:29 unsafe_execute! at /path/to/local/CUDA.jl/lib/cufft/fft.jl:332 unsafe_execute_trailing! at /path/to/local/CUDA.jl/lib/cufft/fft.jl:401 * at /path/to/local/CUDA.jl/lib/cufft/fft.jl:455 [inlined] macro expansion at /local/directory/deadlock.jl:12 [inlined] #2#threadsfor_fun#2 at ./threadingconstructs.jl:214 #2#threadsfor_fun at ./threadingconstructs.jl:181 [inlined] #1 at ./threadingconstructs.jl:153 unknown function (ip: 0x7fcb27c84562) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jl_apply at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined] start_task at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:1238 unknown function (ip: (nil)) pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173 jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350 _IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) #189 at /path/to/local/CUDA.jl/lib/utils/call.jl:30 unknown function (ip: (nil)) _mm_pause at /usr/local/lib/gcc/x86_64-linux-gnu/9.1.0/include/xmmintrin.h:1271 [inlined] jl_gc_wait_for_the_world at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:242 [inlined] ijl_gc_collect at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:3502 maybe_collect at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:937 [inlined] jl_gc_pool_alloc_inner at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:1293 [inlined] jl_gc_pool_alloc_noinline at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:1350 jl_gc_alloc_ at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:477 [inlined] _new_array_ at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/array.c:144 [inlined] _new_array at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/array.c:198 [inlined] ijl_alloc_array_3d at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/array.c:450 Array at ./boot.jl:481 [inlined] Array at ./boot.jl:488 [inlined] similar at ./array.jl:420 [inlined] similar at ./abstractarray.jl:828 [inlined] _unsafe_getindex at ./multidimensional.jl:901 _getindex at ./multidimensional.jl:889 [inlined] getindex at ./abstractarray.jl:1288 [inlined] macro expansion at /local/directory/deadlock.jl:12 [inlined] #2#threadsfor_fun#2 at ./threadingconstructs.jl:214 #2#threadsfor_fun at ./threadingconstructs.jl:181 [inlined] #1 at ./threadingconstructs.jl:153 unknown function (ip: 0x7fcb27c84562) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jl_apply at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined] start_task at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:1238 unknown function (ip: (nil)) pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173 jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350 _IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) unknown function (ip: 0x7fcb27c7e9dc) unknown function (ip: (nil)) pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173 jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350 _IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) ijl_process_events at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/jl_uv.c:277 unknown function (ip: (nil)) pthread_rwlock_wrlock at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) unknown function (ip: 0x7fcaee2f131b) unknown function (ip: 0x7fcaedffca08) unknown function (ip: 0x7fcaedffd5eb) unknown function (ip: 0x7fcaee110946) unknown function (ip: 0x7fca7440b04c) unknown function (ip: 0x7fca744302de) unknown function (ip: 0x7fca7443a879) unknown function (ip: 0x7fca744272d9) unknown function (ip: 0x7fca7442b6bc) unknown function (ip: 0x7fca74455bda) unknown function (ip: 0x7fca74408fe9) cufftXtExecDescriptor at /home/username/.julia/artifacts/c0e6b8fff2621303ace1cc360b7fca676b4e28fd/lib/libcufft.so (unknown line) cufftXtExec at /home/username/.julia/artifacts/c0e6b8fff2621303ace1cc360b7fca676b4e28fd/lib/libcufft.so (unknown line) macro expansion at /path/to/local/CUDA.jl/lib/cufft/libcufft.jl:229 [inlined] #46 at /path/to/local/CUDA.jl/lib/utils/call.jl:30 [inlined] retry_reclaim at /path/to/local/CUDA.jl/src/pool.jl:370 [inlined] check at /path/to/local/CUDA.jl/lib/cufft/libcufft.jl:18 [inlined] cufftExecC2C at /path/to/local/CUDA.jl/lib/utils/call.jl:29 unsafe_execute! at /path/to/local/CUDA.jl/lib/cufft/fft.jl:332 unsafe_execute_trailing! at /path/to/local/CUDA.jl/lib/cufft/fft.jl:401 * at /path/to/local/CUDA.jl/lib/cufft/fft.jl:455 [inlined] macro expansion at /local/directory/deadlock.jl:12 [inlined] #2#threadsfor_fun#2 at ./threadingconstructs.jl:214 #2#threadsfor_fun at ./threadingconstructs.jl:181 [inlined] #1 at ./threadingconstructs.jl:153 unknown function (ip: 0x7fcb27c84562) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jl_apply at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined] start_task at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:1238 unknown function (ip: (nil)) pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173 jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350 _IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) jl_gc_state_set at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_threads.h:351 [inlined] jl_gc_state_set at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_threads.h:344 [inlined] jl_gc_safe_leave at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/jlapi.c:465unknown function (ip: (nil)) ============================================================== Profile collected. A report will print at the next yield point ============================================================== ```
I extended the PR to cover all libraries, i.e., including cuFFT. Can you test again?
Problem fixed with the latest version! Thank you very much!
Describe the bug
Thanks for your work on this library. Some of the code I wrote with multi-threading and CUDA hangs forever when using julia-1.10, it runs correctly with julia-1.9.4.
I manually reduced the code to the best of ability using differential testing while still triggering the bug.
To reproduce
The program hangs forever when using 4, 5, 6, 7 and 8 threads (my core count) with julia 1.10. The Minimal Working Example (MWE) for this bug is:
The program finishes normally with julia1.9
Manifest.toml
``` [[deps.AbstractFFTs]] deps = ["LinearAlgebra"] git-tree-sha1 = "d92ad398961a3ed262d8bf04a1a2b8340f915fef" uuid = "621f4979-c628-5d54-868e-fcf4e3e8185c" version = "1.5.0" [deps.AbstractFFTs.extensions] AbstractFFTsChainRulesCoreExt = "ChainRulesCore" AbstractFFTsTestExt = "Test" [deps.AbstractFFTs.weakdeps] ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4" Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40" [[deps.Adapt]] deps = ["LinearAlgebra", "Requires"] git-tree-sha1 = "0fb305e0253fd4e833d486914367a2ee2c2e78d0" uuid = "79e6a3ab-5dfb-504d-930d-738a2a938a0e" version = "4.0.1" weakdeps = ["StaticArrays"] [deps.Adapt.extensions] AdaptStaticArraysExt = "StaticArrays" [[deps.CUDA]] deps = ["AbstractFFTs", "Adapt", "BFloat16s", "CEnum", "CUDA_Driver_jll", "CUDA_Runtime_Discovery", "CUDA_Runtime_jll", "Crayons", "DataFrames", "ExprTools", "GPUArrays", "GPUCompiler", "KernelAbstractions", "LLVM", "LLVMLoopInfo", "LazyArtifacts", "Libdl", "LinearAlgebra", "Logging", "NVTX", "Preferences", "PrettyTables", "Printf", "Random", "Random123", "RandomNumbers", "Reexport", "Requires", "SparseArrays", "StaticArrays", "Statistics"] git-tree-sha1 = "baa8ea7a1ea63316fa3feb454635215773c9c845" uuid = "052768ef-5323-5732-b1bb-66c8b64840ba" version = "5.2.0" [deps.CUDA.extensions] ChainRulesCoreExt = "ChainRulesCore" SpecialFunctionsExt = "SpecialFunctions" [deps.CUDA.weakdeps] ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4" SpecialFunctions = "276daf66-3868-5448-9aa4-cd146d93841b" [[deps.CUDA_Driver_jll]] deps = ["Artifacts", "JLLWrappers", "LazyArtifacts", "Libdl", "Pkg"] git-tree-sha1 = "d01bfc999768f0a31ed36f5d22a76161fc63079c" uuid = "4ee394cb-3365-5eb0-8335-949819d2adfc" version = "0.7.0+1" [[deps.CUDA_Runtime_Discovery]] deps = ["Libdl"] git-tree-sha1 = "2cb12f6b2209f40a4b8967697689a47c50485490" uuid = "1af6417a-86b4-443c-805f-a4643ffb695f" version = "0.2.3" [[deps.CUDA_Runtime_jll]] deps = ["Artifacts", "CUDA_Driver_jll", "JLLWrappers", "LazyArtifacts", "Libdl", "TOML"] git-tree-sha1 = "8e25c009d2bf16c2c31a70a6e9e8939f7325cc84" uuid = "76a88914-d11a-5bdc-97e0-2f5a05c973a2" version = "0.11.1+0" [[deps.GPUArrays]] deps = ["Adapt", "GPUArraysCore", "LLVM", "LinearAlgebra", "Printf", "Random", "Reexport", "Serialization", "Statistics"] git-tree-sha1 = "47e4686ec18a9620850bad110b79966132f14283" uuid = "0c68f7d7-f131-5f86-a1c3-88cf8149b2d7" version = "10.0.2" [[deps.GPUArraysCore]] deps = ["Adapt"] git-tree-sha1 = "ec632f177c0d990e64d955ccc1b8c04c485a0950" uuid = "46192b85-c4d5-4398-a991-12ede77f4527" version = "0.1.6" [[deps.GPUCompiler]] deps = ["ExprTools", "InteractiveUtils", "LLVM", "Libdl", "Logging", "Scratch", "TimerOutputs", "UUIDs"] git-tree-sha1 = "a846f297ce9d09ccba02ead0cae70690e072a119" uuid = "61eb1bfa-7361-4325-ad38-22787b887f55" version = "0.25.0" [[deps.JuliaNVTXCallbacks_jll]] deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"] git-tree-sha1 = "af433a10f3942e882d3c671aacb203e006a5808f" uuid = "9c1d0b0a-7046-5b2e-a33f-ea22f176ac7e" version = "0.2.1+0" [[deps.KernelAbstractions]] deps = ["Adapt", "Atomix", "InteractiveUtils", "LinearAlgebra", "MacroTools", "PrecompileTools", "Requires", "SparseArrays", "StaticArrays", "UUIDs", "UnsafeAtomics", "UnsafeAtomicsLLVM"] git-tree-sha1 = "4e0cb2f5aad44dcfdc91088e85dee4ecb22c791c" uuid = "63c18a36-062a-441e-b654-da1e3ab1ce7c" version = "0.9.16" [deps.KernelAbstractions.extensions] EnzymeExt = "EnzymeCore" [deps.KernelAbstractions.weakdeps] EnzymeCore = "f151be2c-9106-41f4-ab19-57ee4f262869" [[deps.LLVM]] deps = ["CEnum", "LLVMExtra_jll", "Libdl", "Preferences", "Printf", "Requires", "Unicode"] git-tree-sha1 = "cb4619f7353fc62a1a22ffa3d7ed9791cfb47ad8" uuid = "929cbde3-209d-540e-8aea-75f648917ca0" version = "6.4.2" weakdeps = ["BFloat16s"] [deps.LLVM.extensions] BFloat16sExt = "BFloat16s" [[deps.LLVMExtra_jll]] deps = ["Artifacts", "JLLWrappers", "LazyArtifacts", "Libdl", "TOML"] git-tree-sha1 = "98eaee04d96d973e79c25d49167668c5c8fb50e2" uuid = "dad2f222-ce93-54a1-a47d-0025e8a3acab" version = "0.0.27+1" [[deps.LLVMLoopInfo]] git-tree-sha1 = "2e5c102cfc41f48ae4740c7eca7743cc7e7b75ea" uuid = "8b046642-f1f6-4319-8d3c-209ddc03c586" version = "1.0.0" [[deps.LLVMOpenMP_jll]] deps = ["Artifacts", "JLLWrappers", "Libdl"] git-tree-sha1 = "d986ce2d884d49126836ea94ed5bfb0f12679713" uuid = "1d63c593-3942-5779-bab2-d838dc0a180e" version = "15.0.7+0" [[deps.NVTX]] deps = ["Colors", "JuliaNVTXCallbacks_jll", "Libdl", "NVTX_jll"] git-tree-sha1 = "53046f0483375e3ed78e49190f1154fa0a4083a1" uuid = "5da4648a-3479-48b8-97b9-01cb529c0a1f" version = "0.3.4" [[deps.NVTX_jll]] deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"] git-tree-sha1 = "ce3269ed42816bf18d500c9f63418d4b0d9f5a3b" uuid = "e98f9f5b-d649-5603-91fd-7774390e6439" version = "3.1.0+2" ```
Expected behavior
I expect the program to finish (and cu_result to contain the correct result).
Version info
Details for Julia 1.10
CUDA version with Julia 1.10
Version details with Julia 1.9
Details of Julia 1.9 ``` Julia Version 1.9.4 Commit 8e5136fa29 (2023-11-14 08:46 UTC) Build Info: Official https://julialang.org/ release Platform Info: OS: Windows (x86_64-w64-mingw32) CPU: 16 × 11th Gen Intel(R) Core(TM) i9-11900K @ 3.50GHz WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-14.0.6 (ORCJIT, rocketlake) Threads: 12 on 16 virtual cores ``` Details on CUDA (Julia 1.9.4): ``` CUDA runtime 12.3, artifact installation CUDA driver 12.3 NVIDIA driver 546.12.0 CUDA libraries: - CUBLAS: 12.3.4 - CURAND: 10.3.4 - CUFFT: 11.0.12 - CUSOLVER: 11.5.4 - CUSPARSE: 12.2.0 - CUPTI: 21.0.0 - NVML: 12.0.0+546.12 Julia packages: - CUDA: 5.2.0 - CUDA_Driver_jll: 0.7.0+1 - CUDA_Runtime_jll: 0.11.1+0 Toolchain: - Julia: 1.9.4 - LLVM: 14.0.6 Preferences: - CUDA_Runtime_jll.version: 12.3 1 device: 0: NVIDIA RTX A5000 (sm_86, 20.694 GiB / 23.988 GiB available) ```
Additional context
Thanks very much for your help! Please let me know if I can help further with this!