JuliaGPU / CUDA.jl

CUDA programming in Julia.
https://juliagpu.org/cuda/
Other
1.21k stars 221 forks source link

Multi-threaded code hanging forever with Julia 1.10 #2261

Closed AaronGhost closed 9 months ago

AaronGhost commented 9 months ago

Describe the bug

Thanks for your work on this library. Some of the code I wrote with multi-threading and CUDA hangs forever when using julia-1.10, it runs correctly with julia-1.9.4.

I manually reduced the code to the best of ability using differential testing while still triggering the bug.

To reproduce

The program hangs forever when using 4, 5, 6, 7 and 8 threads (my core count) with julia 1.10. The Minimal Working Example (MWE) for this bug is:

using CUDA

function main()
    data = rand(ComplexF32, (100, 100, 8, 20, 200))
    cu_result = CUDA.zeros(ComplexF32, (100, 100, 20, 200))

    Threads.@threads for i in axes(data, 5)
        for t in axes(data, 4)
            cu_result[:, :, t, i] .= sum(CuArray(data[:, :, :, t, i]))
        end
    end
end

println("Starting first iteration")
main()
println("First iteration finished")
main()
println("Second iteration finished")

The program finishes normally with julia1.9

Manifest.toml

``` [[deps.AbstractFFTs]] deps = ["LinearAlgebra"] git-tree-sha1 = "d92ad398961a3ed262d8bf04a1a2b8340f915fef" uuid = "621f4979-c628-5d54-868e-fcf4e3e8185c" version = "1.5.0" [deps.AbstractFFTs.extensions] AbstractFFTsChainRulesCoreExt = "ChainRulesCore" AbstractFFTsTestExt = "Test" [deps.AbstractFFTs.weakdeps] ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4" Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40" [[deps.Adapt]] deps = ["LinearAlgebra", "Requires"] git-tree-sha1 = "0fb305e0253fd4e833d486914367a2ee2c2e78d0" uuid = "79e6a3ab-5dfb-504d-930d-738a2a938a0e" version = "4.0.1" weakdeps = ["StaticArrays"] [deps.Adapt.extensions] AdaptStaticArraysExt = "StaticArrays" [[deps.CUDA]] deps = ["AbstractFFTs", "Adapt", "BFloat16s", "CEnum", "CUDA_Driver_jll", "CUDA_Runtime_Discovery", "CUDA_Runtime_jll", "Crayons", "DataFrames", "ExprTools", "GPUArrays", "GPUCompiler", "KernelAbstractions", "LLVM", "LLVMLoopInfo", "LazyArtifacts", "Libdl", "LinearAlgebra", "Logging", "NVTX", "Preferences", "PrettyTables", "Printf", "Random", "Random123", "RandomNumbers", "Reexport", "Requires", "SparseArrays", "StaticArrays", "Statistics"] git-tree-sha1 = "baa8ea7a1ea63316fa3feb454635215773c9c845" uuid = "052768ef-5323-5732-b1bb-66c8b64840ba" version = "5.2.0" [deps.CUDA.extensions] ChainRulesCoreExt = "ChainRulesCore" SpecialFunctionsExt = "SpecialFunctions" [deps.CUDA.weakdeps] ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4" SpecialFunctions = "276daf66-3868-5448-9aa4-cd146d93841b" [[deps.CUDA_Driver_jll]] deps = ["Artifacts", "JLLWrappers", "LazyArtifacts", "Libdl", "Pkg"] git-tree-sha1 = "d01bfc999768f0a31ed36f5d22a76161fc63079c" uuid = "4ee394cb-3365-5eb0-8335-949819d2adfc" version = "0.7.0+1" [[deps.CUDA_Runtime_Discovery]] deps = ["Libdl"] git-tree-sha1 = "2cb12f6b2209f40a4b8967697689a47c50485490" uuid = "1af6417a-86b4-443c-805f-a4643ffb695f" version = "0.2.3" [[deps.CUDA_Runtime_jll]] deps = ["Artifacts", "CUDA_Driver_jll", "JLLWrappers", "LazyArtifacts", "Libdl", "TOML"] git-tree-sha1 = "8e25c009d2bf16c2c31a70a6e9e8939f7325cc84" uuid = "76a88914-d11a-5bdc-97e0-2f5a05c973a2" version = "0.11.1+0" [[deps.GPUArrays]] deps = ["Adapt", "GPUArraysCore", "LLVM", "LinearAlgebra", "Printf", "Random", "Reexport", "Serialization", "Statistics"] git-tree-sha1 = "47e4686ec18a9620850bad110b79966132f14283" uuid = "0c68f7d7-f131-5f86-a1c3-88cf8149b2d7" version = "10.0.2" [[deps.GPUArraysCore]] deps = ["Adapt"] git-tree-sha1 = "ec632f177c0d990e64d955ccc1b8c04c485a0950" uuid = "46192b85-c4d5-4398-a991-12ede77f4527" version = "0.1.6" [[deps.GPUCompiler]] deps = ["ExprTools", "InteractiveUtils", "LLVM", "Libdl", "Logging", "Scratch", "TimerOutputs", "UUIDs"] git-tree-sha1 = "a846f297ce9d09ccba02ead0cae70690e072a119" uuid = "61eb1bfa-7361-4325-ad38-22787b887f55" version = "0.25.0" [[deps.JuliaNVTXCallbacks_jll]] deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"] git-tree-sha1 = "af433a10f3942e882d3c671aacb203e006a5808f" uuid = "9c1d0b0a-7046-5b2e-a33f-ea22f176ac7e" version = "0.2.1+0" [[deps.KernelAbstractions]] deps = ["Adapt", "Atomix", "InteractiveUtils", "LinearAlgebra", "MacroTools", "PrecompileTools", "Requires", "SparseArrays", "StaticArrays", "UUIDs", "UnsafeAtomics", "UnsafeAtomicsLLVM"] git-tree-sha1 = "4e0cb2f5aad44dcfdc91088e85dee4ecb22c791c" uuid = "63c18a36-062a-441e-b654-da1e3ab1ce7c" version = "0.9.16" [deps.KernelAbstractions.extensions] EnzymeExt = "EnzymeCore" [deps.KernelAbstractions.weakdeps] EnzymeCore = "f151be2c-9106-41f4-ab19-57ee4f262869" [[deps.LLVM]] deps = ["CEnum", "LLVMExtra_jll", "Libdl", "Preferences", "Printf", "Requires", "Unicode"] git-tree-sha1 = "cb4619f7353fc62a1a22ffa3d7ed9791cfb47ad8" uuid = "929cbde3-209d-540e-8aea-75f648917ca0" version = "6.4.2" weakdeps = ["BFloat16s"] [deps.LLVM.extensions] BFloat16sExt = "BFloat16s" [[deps.LLVMExtra_jll]] deps = ["Artifacts", "JLLWrappers", "LazyArtifacts", "Libdl", "TOML"] git-tree-sha1 = "98eaee04d96d973e79c25d49167668c5c8fb50e2" uuid = "dad2f222-ce93-54a1-a47d-0025e8a3acab" version = "0.0.27+1" [[deps.LLVMLoopInfo]] git-tree-sha1 = "2e5c102cfc41f48ae4740c7eca7743cc7e7b75ea" uuid = "8b046642-f1f6-4319-8d3c-209ddc03c586" version = "1.0.0" [[deps.LLVMOpenMP_jll]] deps = ["Artifacts", "JLLWrappers", "Libdl"] git-tree-sha1 = "d986ce2d884d49126836ea94ed5bfb0f12679713" uuid = "1d63c593-3942-5779-bab2-d838dc0a180e" version = "15.0.7+0" [[deps.NVTX]] deps = ["Colors", "JuliaNVTXCallbacks_jll", "Libdl", "NVTX_jll"] git-tree-sha1 = "53046f0483375e3ed78e49190f1154fa0a4083a1" uuid = "5da4648a-3479-48b8-97b9-01cb529c0a1f" version = "0.3.4" [[deps.NVTX_jll]] deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"] git-tree-sha1 = "ce3269ed42816bf18d500c9f63418d4b0d9f5a3b" uuid = "e98f9f5b-d649-5603-91fd-7774390e6439" version = "3.1.0+2" ```

Expected behavior

I expect the program to finish (and cu_result to contain the correct result).

Version info

Details for Julia 1.10

Julia Version 1.10.0
Commit 3120989f39 (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 16 × 11th Gen Intel(R) Core(TM) i9-11900K @ 3.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, rocketlake)
  Threads: 1 on 16 virtual cores

CUDA version with Julia 1.10

CUDA runtime 12.3, artifact installation
CUDA driver 12.3
NVIDIA driver 546.12.0

CUDA libraries:
- CUBLAS: 12.3.4
- CURAND: 10.3.4
- CUFFT: 11.0.12
- CUSOLVER: 11.5.4
- CUSPARSE: 12.2.0
- CUPTI: 21.0.0
- NVML: 12.0.0+546.12

Julia packages:
- CUDA: 5.2.0
- CUDA_Driver_jll: 0.7.0+1
- CUDA_Runtime_jll: 0.11.1+0

Toolchain:
- Julia: 1.10.0
- LLVM: 15.0.7

1 device:
  0: NVIDIA RTX A5000 (sm_86, 18.249 GiB / 23.988 GiB available)
Version details with Julia 1.9

Details of Julia 1.9 ``` Julia Version 1.9.4 Commit 8e5136fa29 (2023-11-14 08:46 UTC) Build Info: Official https://julialang.org/ release Platform Info: OS: Windows (x86_64-w64-mingw32) CPU: 16 × 11th Gen Intel(R) Core(TM) i9-11900K @ 3.50GHz WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-14.0.6 (ORCJIT, rocketlake) Threads: 12 on 16 virtual cores ``` Details on CUDA (Julia 1.9.4): ``` CUDA runtime 12.3, artifact installation CUDA driver 12.3 NVIDIA driver 546.12.0 CUDA libraries: - CUBLAS: 12.3.4 - CURAND: 10.3.4 - CUFFT: 11.0.12 - CUSOLVER: 11.5.4 - CUSPARSE: 12.2.0 - CUPTI: 21.0.0 - NVML: 12.0.0+546.12 Julia packages: - CUDA: 5.2.0 - CUDA_Driver_jll: 0.7.0+1 - CUDA_Runtime_jll: 0.11.1+0 Toolchain: - Julia: 1.9.4 - LLVM: 14.0.6 Preferences: - CUDA_Runtime_jll.version: 12.3 1 device: 0: NVIDIA RTX A5000 (sm_86, 20.694 GiB / 23.988 GiB available) ```

Additional context

Thanks very much for your help! Please let me know if I can help further with this!

vchuravy commented 9 months ago

Using https://docs.julialang.org/en/v1/stdlib/Profile/#Triggered-During-Execution

It looks like we get stuck on entering GC because cuOccupancyMaxPotentialBlockSize is blocking

unknown function (ip: 0x7ffae4162444)
__pthread_rwlock_wrlock at /usr/lib/libc.so.6 (unknown line)
unknown function (ip: 0x7ffa52104347)
unknown function (ip: 0x7ffa51f14b34)
macro expansion at /home/vchuravy/.julia/packages/CUDA/6Jmwc/lib/cudadrv/libcuda.jl:4848 [inlined]
#705 at /home/vchuravy/.julia/packages/CUDA/6Jmwc/lib/utils/call.jl:27
check at /home/vchuravy/.julia/packages/CUDA/6Jmwc/lib/cudadrv/libcuda.jl:32 [inlined]
cuOccupancyMaxPotentialBlockSize at /home/vchuravy/.julia/packages/CUDA/6Jmwc/lib/utils/call.jl:26 [inlined]
#launch_configuration#901 at /home/vchuravy/.julia/packages/CUDA/6Jmwc/lib/cudadrv/occupancy.jl:59 [inlined]
launch_configuration at /home/vchuravy/.julia/packages/CUDA/6Jmwc/lib/cudadrv/occupancy.jl:54 [inlined]
#launch_heuristic#1126 at /home/vchuravy/.julia/packages/CUDA/6Jmwc/src/gpuarrays.jl:22 [inlined]
launch_heuristic at /home/vchuravy/.julia/packages/CUDA/6Jmwc/src/gpuarrays.jl:15 [inlined]
_copyto! at /home/vchuravy/.julia/packages/GPUArrays/Hd5Sk/src/host/broadcast.jl:56 [inlined]
materialize! at /home/vchuravy/.julia/packages/GPUArrays/Hd5Sk/src/host/broadcast.jl:32 [inlined]
materialize! at ./broadcast.jl:911 [inlined]
macro expansion at ./REPL[4]:7 [inlined]
#2#threadsfor_fun#1 at ./threadingconstructs.jl:214
#2#threadsfor_fun at ./threadingconstructs.jl:181 [inlined]
#1 at ./threadingconstructs.jl:153
unknown function (ip: 0x7ffaccf94692)
_jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
start_task at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:1238

unknown function (ip: (nil))
unknown function (ip: 0x7ffae41624ac)
pthread_cond_wait at /usr/lib/libc.so.6 (unknown line)
uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883
jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173
jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350
unknown function (ip: 0x7ffae411770f)
toInt64 at ./boot.jl:703 [inlined]
Int64 at ./boot.jl:784 [inlined]
convert at ./number.jl:7 [inlined]
_promote at ./promotion.jl:370 [inlined]
promote at ./promotion.jl:393 [inlined]
< at ./promotion.jl:462 [inlined]
> at ./operators.jl:378 [inlined]
compute_threads at /home/vchuravy/.julia/packages/CUDA/6Jmwc/src/mapreduce.jl:222 [inlined]
call_composed at ./operators.jl:1045 [inlined]
call_composed at ./operators.jl:1044 [inlined]
#_#103 at ./operators.jl:1041 [inlined]
ComposedFunction at ./operators.jl:1041 [inlined]
#902 at /home/vchuravy/.julia/packages/CUDA/6Jmwc/lib/cudadrv/occupancy.jl:61
unknown function (ip: (nil))
vchuravy commented 9 months ago

Could you test https://github.com/JuliaGPU/CUDA.jl/pull/2262 and see if it fixes your issue?

AaronGhost commented 9 months ago

Thanks for looking into it! I checked out the branch locally, deved and it still deadlocks on Windows. I am happy to run some diagnostics to track it further, but not really sure what commands I need to run (The profiler can't be triggered during execution on Windows if I understood correctly and the @profile never returns due to the deadlock, unless I am missing something?).

vchuravy commented 9 months ago

Yeah windows makes that harder, if you can somehow get a backtrack for all threads that would help immensely.

I could reproduce the hang on Linux before, but can't anymore. Maybe you could try WSL?

AaronGhost commented 9 months ago

I managed to reproduce a deadlock with WSL. I ran the program with 4 threads this time. The first iteration of main completes but the deadlock happens on the second iteration. I then used the signal method to get the backtrace. The backtrace is below. Let me know if I can do anything else to help!

Backtrace

``` signal (10): User defined signal 1 pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173 jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350 _IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) ijl_process_events at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/jl_uv.c:277 ijl_task_get_next at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:524 poptask at ./task.jl:985 wait at ./task.jl:994 #wait#645 at ./condition.jl:130 wait at ./condition.jl:125 [inlined] take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:53 synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120 unknown function (ip: 0x7f51ba9eafd8) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jlcapi_synchronization_worker_13802 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line) start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) unknown function (ip: (nil)) pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 ijl_task_get_next at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:509 poptask at ./task.jl:985 wait at ./task.jl:994 #wait#645 at ./condition.jl:130 wait at ./condition.jl:125 [inlined] take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:53 synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120 unknown function (ip: 0x7f51ba9eafd8) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jlcapi_synchronization_worker_13802 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line) start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) unknown function (ip: (nil)) pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173 jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350 _IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) _jl_mutex_unlock at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/threading.c:927 jl_mutex_unlock at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_locks.h:80 [inlined] ijl_process_events at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/jl_uv.c:286 ijl_task_get_next at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:524 poptask at ./task.jl:985 wait at ./task.jl:994 #wait#645 at ./condition.jl:130 wait at ./condition.jl:125 [inlined] take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:53 synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120 unknown function (ip: 0x7f51ba9eafd8) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jlcapi_synchronization_worker_13802 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line) start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) unknown function (ip: (nil)) pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 ijl_task_get_next at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:509 poptask at ./task.jl:985 wait at ./task.jl:994 #wait#645 at ./condition.jl:130 wait at ./condition.jl:125 [inlined] take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:53 synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120 unknown function (ip: 0x7f51ba9eafd8) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jlcapi_synchronization_worker_13802 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line) start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) unknown function (ip: (nil)) _mm_pause at /usr/local/lib/gcc/x86_64-linux-gnu/9.1.0/include/xmmintrin.h:1271 [inlined] jl_gc_wait_for_the_world at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:242 [inlined] ijl_gc_collect at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:3502 maybe_collect at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:937 [inlined] jl_gc_pool_alloc_inner at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:1293 [inlined] jl_gc_pool_alloc_noinline at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:1350 jl_gc_alloc_ at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:477 [inlined] _new_array_ at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/array.c:144 [inlined] _new_array at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/array.c:198 [inlined] ijl_alloc_array_3d at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/array.c:450 Array at ./boot.jl:481 [inlined] Array at ./boot.jl:488 [inlined] similar at ./array.jl:420 [inlined] similar at ./abstractarray.jl:828 [inlined] _unsafe_getindex at ./multidimensional.jl:901 _getindex at ./multidimensional.jl:889 [inlined] getindex at ./abstractarray.jl:1288 [inlined] macro expansion at /mnt/d/Documents/Julia/deadlock.jl:9 [inlined] #2#threadsfor_fun#1 at ./threadingconstructs.jl:214 #2#threadsfor_fun at ./threadingconstructs.jl:181 [inlined] #1 at ./threadingconstructs.jl:153 unknown function (ip: 0x7f51ba9d6bc2) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jl_apply at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined] start_task at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:1238 unknown function (ip: (nil)) pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173 jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350 _IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) task_local_state! at /path/to/local/CUDA.jl/lib/cudadrv/state.jl:69 unknown function (ip: (nil)) pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173 jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350 _IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) unknown function (ip: 0x7f51ba9ca1fc) unknown function (ip: (nil)) pthread_rwlock_wrlock at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) unknown function (ip: 0x7f515f244454) unknown function (ip: 0x7f515ef80536) unknown function (ip: 0x7f515ef81233) unknown function (ip: 0x7f515ef82eae) unknown function (ip: 0x7f515f067f34) macro expansion at /path/to/local/CUDA.jl/lib/cudadrv/libcuda.jl:356 [inlined] #49 at /path/to/local/CUDA.jl/lib/utils/call.jl:27 [inlined] check at /path/to/local/CUDA.jl/lib/cudadrv/libcuda.jl:32 [inlined] cuMemcpyDtoHAsync_v2 at /path/to/local/CUDA.jl/lib/utils/call.jl:26 [inlined] #unsafe_copyto!#8 at /path/to/local/CUDA.jl/lib/cudadrv/memory.jl:397 [inlined] unsafe_copyto! at /path/to/local/CUDA.jl/lib/cudadrv/memory.jl:394 #1055 at /path/to/local/CUDA.jlsrc/array.jl:610 #context!#913 at /path/to/local/CUDA.jl/lib/cudadrv/state.jl:170 [inlined] context! at /path/to/local/CUDA.jl/lib/cudadrv/state.jl:165 [inlined] unsafe_copyto! at /path/to/local/CUDA.jlsrc/array.jl:602 copyto! at /path/to/local/CUDA.jlsrc/array.jl:555 [inlined] getindex at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/indexing.jl:50 scalar_getindex at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/indexing.jl:34 [inlined] _getindex at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/indexing.jl:17 [inlined] getindex at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/indexing.jl:15 [inlined] macro expansion at /home/username/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:210 [inlined] #_mapreduce#43 at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/mapreduce.jl:71 _mapreduce at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/mapreduce.jl:33 [inlined] #mapreduce#41 at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/mapreduce.jl:28 [inlined] mapreduce at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/mapreduce.jl:28 [inlined] #_sum#831 at ./reducedim.jl:1015 [inlined] _sum at ./reducedim.jl:1015 [inlined] #_sum#830 at ./reducedim.jl:1014 [inlined] _sum at ./reducedim.jl:1014 [inlined] #sum#828 at ./reducedim.jl:1010 [inlined] sum at ./reducedim.jl:1010 [inlined] macro expansion at /mnt/d/Documents/Julia/deadlock.jl:9 [inlined] #2#threadsfor_fun#1 at ./threadingconstructs.jl:214 #2#threadsfor_fun at ./threadingconstructs.jl:181 [inlined] #1 at ./threadingconstructs.jl:153 unknown function (ip: 0x7f51ba9d6bc2) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jl_apply at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined] start_task at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:1238 unknown function (ip: (nil)) ```

Collected profile

``` Overhead ╎ [+additional indent] Count File:Line; Function ========================================================= Thread 1 Task 0x00007f50e87044c0 Total snapshots: 385. Utilization: 100% ╎385 @Base/threadingconstructs.jl:153; (::Base.Threads.var"#1#2"{var"#2#threadsfor_fun#2"{var"#2#threadsfor_fun#1#3"{CuArray{ComplexF32, 4, CUDA.Mem.DeviceBuffer}, Array{ComplexF32, 5}, Base.OneTo{Int64}}}, Int64})() ╎ 385 @Base/threadingconstructs.jl:181; #2#threadsfor_fun ╎ 385 @Base/threadingconstructs.jl:214; (::var"#2#threadsfor_fun#2"{var"#2#threadsfor_fun#1#3"{CuArray{ComplexF32, 4, CUDA.Mem.DeviceBuffer}, Array{ComplexF32, 5}, Base.OneTo{Int64}}})(tid::Int64; onethread::Bool) ╎ 385 /mnt/d/Documents/Julia/deadlock.jl:9; macro expansion ╎ 385 @Base/reducedim.jl:1010; sum ╎ 385 @Base/reducedim.jl:1010; #sum#828 ╎ ╎ 385 @Base/reducedim.jl:1014; _sum ╎ ╎ 385 @Base/reducedim.jl:1014; #_sum#830 ╎ ╎ 385 @Base/reducedim.jl:1015; _sum ╎ ╎ 385 @Base/reducedim.jl:1015; #_sum#831 ╎ ╎ 385 @GPUArrays/src/host/mapreduce.jl:28; mapreduce ╎ ╎ ╎ 385 @GPUArrays/src/host/mapreduce.jl:28; #mapreduce#41 ╎ ╎ ╎ 385 @GPUArrays/src/host/mapreduce.jl:33; _mapreduce ╎ ╎ ╎ 385 @GPUArrays/src/host/mapreduce.jl:71; _mapreduce(f::typeof(identity), op::typeof(Base.add_sum), As::CuArray{ComplexF32, 3, CUDA.Mem.DeviceBuffer}; dims::Colon, init::Nothing) ╎ ╎ ╎ 385 @GPUArraysCore/src/GPUArraysCore.jl:210; macro expansion ╎ ╎ ╎ 385 @GPUArrays/src/host/indexing.jl:15; getindex(A::CuArray{ComplexF32, 3, CUDA.Mem.DeviceBuffer}, I::Int64) ╎ ╎ ╎ ╎ 385 @GPUArrays/src/host/indexing.jl:17; _getindex ╎ ╎ ╎ ╎ 385 @GPUArrays/src/host/indexing.jl:34; scalar_getindex ╎ ╎ ╎ ╎ 385 @GPUArrays/src/host/indexing.jl:50; getindex(A::CuArray{ComplexF32, 3, CUDA.Mem.DeviceBuffer}, I::Int64) ╎ ╎ ╎ ╎ 385 @CUDA/src/array.jl:555; copyto! ╎ ╎ ╎ ╎ 385 @CUDA/src/array.jl:602; unsafe_copyto!(dest::Vector{ComplexF32}, doffs::Int64, src::CuArray{ComplexF32, 3, CUDA.Mem.DeviceBuffer}, soffs::Int64, n::Int64) ╎ ╎ ╎ ╎ ╎ 385 @CUDA/lib/cudadrv/state.jl:165; context!(ctx::CuContext) ╎ ╎ ╎ ╎ ╎ 385 @CUDA/lib/cudadrv/state.jl:170; #context!#913 ╎ ╎ ╎ ╎ ╎ 385 @CUDA/src/array.jl:610; (::CUDA.var"#1055#1056"{ComplexF32, Vector{ComplexF32}, Int64, CuArray{ComplexF32, 3, CUDA.Mem.DeviceBuffer}, Int64, Int64})() ╎ ╎ ╎ ╎ ╎ 385 @CUDA/lib/cudadrv/memory.jl:394; kwcall(::@NamedTuple{async::Bool}, ::typeof(unsafe_copyto!), dst::Ptr{ComplexF32}, src::CuPtr{ComplexF32}, N::Int64) ╎ ╎ ╎ ╎ ╎ 385 @CUDA/lib/cudadrv/memory.jl:397; #unsafe_copyto!#8 ╎ ╎ ╎ ╎ ╎ ╎ 385 @CUDA/lib/utils/call.jl:26; cuMemcpyDtoHAsync_v2 ╎ ╎ ╎ ╎ ╎ ╎ 385 @CUDA/lib/cudadrv/libcuda.jl:32; check ╎ ╎ ╎ ╎ ╎ ╎ 385 @CUDA/lib/utils/call.jl:27; #49 384╎ ╎ ╎ ╎ ╎ ╎ 385 @CUDA/lib/cudadrv/libcuda.jl:356; macro expansion Thread 2 Task 0x00007f50e87041a0 Total snapshots: 385. Utilization: 100% Thread 3 Task 0x00007f50e8704330 Total snapshots: 385. Utilization: 100% 384╎385 @CUDA/lib/cudadrv/state.jl:69; task_local_state!() Thread 4 Task 0x00007f50e8704650 Total snapshots: 385. Utilization: 100% ╎385 @Base/threadingconstructs.jl:153; (::Base.Threads.var"#1#2"{var"#2#threadsfor_fun#2"{var"#2#threadsfor_fun#1#3"{CuArray{ComplexF32, 4, CUDA.Mem.DeviceBuffer}, Array{ComplexF32, 5}, Base.OneTo{Int64}}}, Int64})() ╎ 385 @Base/threadingconstructs.jl:181; #2#threadsfor_fun ╎ 385 @Base/threadingconstructs.jl:214; (::var"#2#threadsfor_fun#2"{var"#2#threadsfor_fun#1#3"{CuArray{ComplexF32, 4, CUDA.Mem.DeviceBuffer}, Array{ComplexF32, 5}, Base.OneTo{Int64}}})(tid::Int64; onethread::Bool) ╎ 385 /mnt/d/Documents/Julia/deadlock.jl:9; macro expansion ╎ 385 @Base/abstractarray.jl:1288; getindex ╎ 385 @Base/multidimensional.jl:889; _getindex ╎ ╎ 385 @Base/multidimensional.jl:901; _unsafe_getindex(::IndexLinear, ::Array{ComplexF32, 5}, ::Base.Slice{Base.OneTo{Int64}}, ::Base.Slice{Base.OneTo{Int64}}, ::Base.Slice{Base.OneTo{Int64}}, ::Int64, ::Int64) ╎ ╎ 385 @Base/abstractarray.jl:828; similar ╎ ╎ 385 @Base/array.jl:420; similar ╎ ╎ 385 @Base/boot.jl:488; Array 384╎ ╎ 385 @Base/boot.jl:481; Array Thread 6 Task 0x00007f50c703c010 Total snapshots: 385. Utilization: 0% ╎385 @CUDA/lib/cudadrv/synchronization.jl:120; synchronization_worker(data::Ptr{Nothing}) ╎ 385 @CUDA/lib/cudadrv/synchronization.jl:53; take!(f::CUDA.var"#921#926", c::CUDA.BidirectionalChannel{Union{CuContext, CuEvent, CuStream}, CUDA.cudaError_enum}) ╎ 385 @Base/condition.jl:125; wait ╎ 385 @Base/condition.jl:130; wait(c::Base.GenericCondition{ReentrantLock}; first::Bool) ╎ 385 @Base/task.jl:994; wait() 384╎ 385 @Base/task.jl:985; poptask(W::Base.IntrusiveLinkedListSynchronized{Task}) Thread 7 Task 0x00007f50c7004010 Total snapshots: 385. Utilization: 100% ╎385 @CUDA/lib/cudadrv/synchronization.jl:120; synchronization_worker(data::Ptr{Nothing}) ╎ 385 @CUDA/lib/cudadrv/synchronization.jl:53; take!(f::CUDA.var"#921#926", c::CUDA.BidirectionalChannel{Union{CuContext, CuEvent, CuStream}, CUDA.cudaError_enum}) ╎ 385 @Base/condition.jl:125; wait ╎ 385 @Base/condition.jl:130; wait(c::Base.GenericCondition{ReentrantLock}; first::Bool) ╎ 385 @Base/task.jl:994; wait() 384╎ 385 @Base/task.jl:985; poptask(W::Base.IntrusiveLinkedListSynchronized{Task}) Thread 8 Task 0x00007f50c7000010 Total snapshots: 385. Utilization: 0% ╎385 @CUDA/lib/cudadrv/synchronization.jl:120; synchronization_worker(data::Ptr{Nothing}) ╎ 385 @CUDA/lib/cudadrv/synchronization.jl:53; take!(f::CUDA.var"#921#926", c::CUDA.BidirectionalChannel{Union{CuContext, CuEvent, CuStream}, CUDA.cudaError_enum}) ╎ 385 @Base/condition.jl:125; wait ╎ 385 @Base/condition.jl:130; wait(c::Base.GenericCondition{ReentrantLock}; first::Bool) ╎ 385 @Base/task.jl:994; wait() 384╎ 385 @Base/task.jl:985; poptask(W::Base.IntrusiveLinkedListSynchronized{Task}) Thread 9 Task 0x00007f50c6ff8010 Total snapshots: 385. Utilization: 100% ╎385 @CUDA/lib/cudadrv/synchronization.jl:120; synchronization_worker(data::Ptr{Nothing}) ╎ 385 @CUDA/lib/cudadrv/synchronization.jl:53; take!(f::CUDA.var"#921#926", c::CUDA.BidirectionalChannel{Union{CuContext, CuEvent, CuStream}, CUDA.cudaError_enum}) ╎ 385 @Base/condition.jl:125; wait ╎ 385 @Base/condition.jl:130; wait(c::Base.GenericCondition{ReentrantLock}; first::Bool) ╎ 385 @Base/task.jl:994; wait() 384╎ 385 @Base/task.jl:985; poptask(W::Base.IntrusiveLinkedListSynchronized{Task}) ```

vchuravy commented 9 months ago

Hm but you were able to collect a profile. That means it didn't fully hang at that point. In the other case I never got a collected profile since we never hit a yield point.

AaronGhost commented 9 months ago

I tried the experience multiple times:

^C^C^C^C^C^CWARNING: Force throwing a SIGINT Segmentation fault

And I never get access to the profile. I added below one of the reports I get in this case:

<details><summary>Backtrace</summary>
<p>

signal (10): User defined signal 1 pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 ijl_task_get_next at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:509 poptask at ./task.jl:985 wait at ./task.jl:994

wait#645 at ./condition.jl:130

wait at ./condition.jl:125 [inlined] take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:53 synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120 unknown function (ip: 0x7f849a3a0808) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jlcapi_synchronization_worker_13802 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line) start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) unknown function (ip: (nil)) pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 ijl_task_get_next at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:509 poptask at ./task.jl:985 wait at ./task.jl:994

wait#645 at ./condition.jl:130

wait at ./condition.jl:125 [inlined] take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:53 synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120 unknown function (ip: 0x7f849a3a0808) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jlcapi_synchronization_worker_13802 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line) start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) unknown function (ip: (nil)) pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 ijl_task_get_next at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:509 poptask at ./task.jl:985 wait at ./task.jl:994

wait#645 at ./condition.jl:130

wait at ./condition.jl:125 [inlined] take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:53 synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120 unknown function (ip: 0x7f849a3a0808) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jlcapi_synchronization_worker_13802 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line) start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) unknown function (ip: (nil)) pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173 jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350 _IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) ijl_gc_safepoint at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/jlapi.c:472 ijl_task_get_next at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:472 poptask at ./task.jl:985 wait at ./task.jl:994

wait#645 at ./condition.jl:130

wait at ./condition.jl:125 [inlined] take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:53 synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120 unknown function (ip: 0x7f849a3a0808) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jlcapi_synchronization_worker_13802 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line) start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) unknown function (ip: (nil)) pthread_rwlock_wrlock at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) unknown function (ip: 0x7f844007231b) unknown function (ip: 0x7f843fd7da08) unknown function (ip: 0x7f843fd7e5eb) unknown function (ip: 0x7f843fe91946) macro expansion at /path/to/local/CUDA.jl/lib/cudadrv/libcuda.jl:4165 [inlined]

507 at /path/to/local/CUDA.jl/lib/utils/call.jl:27 [inlined]

check at /path/to/local/CUDA.jl/lib/cudadrv/libcuda.jl:32 [inlined] cuLaunchKernel at /path/to/local/CUDA.jl/lib/utils/call.jl:26 [inlined]

888 at /path/to/local/CUDA.jl/lib/cudadrv/execution.jl:66

macro expansion at /path/to/local/CUDA.jl/lib/cudadrv/execution.jl:33 [inlined] macro expansion at ./none:0 [inlined] pack_arguments at ./none:0

launch#887 at /path/to/local/CUDA.jl/lib/cudadrv/execution.jl:59

launch at /path/to/local/CUDA.jl/lib/cudadrv/execution.jl:52 [inlined]

894 at /path/to/local/CUDA.jl/lib/cudadrv/execution.jl:175 [inlined]

macro expansion at /path/to/local/CUDA.jl/lib/cudadrv/execution.jl:135 [inlined] macro expansion at ./none:0 [inlined] convert_arguments at ./none:0 [inlined]

cudacall#893 at /path/to/local/CUDA.jl/lib/cudadrv/execution.jl:177 [inlined]

cudacall at /path/to/local/CUDA.jl/lib/cudadrv/execution.jl:173 [inlined] macro expansion at /path/to/local/CUDA.jl/src/compiler/execution.jl:266 [inlined] macro expansion at ./none:0 [inlined]

call#1085 at ./none:0

unknown function (ip: 0x7f849a39e685) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 call at ./none:0 [inlined]

_#1100 at /path/to/local/CUDA.jl/src/compiler/execution.jl:389

HostKernel at /path/to/local/CUDA.jl/src/compiler/execution.jl:388 [inlined] macro expansion at /path/to/local/CUDA.jl/src/compiler/execution.jl:114 [inlined]

mapreducedim!#1161 at /path/to/local/CUDA.jl/src/mapreduce.jl:271

mapreducedim! at /path/to/local/CUDA.jl/src/mapreduce.jl:169 [inlined]

_mapreduce#43 at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/mapreduce.jl:67

_mapreduce at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/mapreduce.jl:33 [inlined]

mapreduce#41 at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/mapreduce.jl:28 [inlined]

mapreduce at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/mapreduce.jl:28 [inlined]

_sum#831 at ./reducedim.jl:1015 [inlined]

_sum at ./reducedim.jl:1015 [inlined]

_sum#830 at ./reducedim.jl:1014 [inlined]

_sum at ./reducedim.jl:1014 [inlined]

sum#828 at ./reducedim.jl:1010 [inlined]

sum at ./reducedim.jl:1010 [inlined] macro expansion at /mnt/d/Documents/Julia/deadlock.jl:9 [inlined]

54#threadsfor_fun#10 at ./threadingconstructs.jl:214

54#threadsfor_fun at ./threadingconstructs.jl:181 [inlined]

1 at ./threadingconstructs.jl:153

unknown function (ip: 0x7f849a3bef52) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jl_apply at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined] start_task at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:1238 unknown function (ip: (nil)) _mm_pause at /usr/local/lib/gcc/x86_64-linux-gnu/9.1.0/include/xmmintrin.h:1271 [inlined] jl_gc_wait_for_the_world at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:242 [inlined] ijl_gc_collect at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:3502 maybe_collect at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:937 [inlined] jl_gc_pool_alloc_inner at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:1293 [inlined] jl_gc_pool_alloc_noinline at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:1350 jl_gcalloc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:477 [inlined] _newarray at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/array.c:144 [inlined] _new_array at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/array.c:198 [inlined] ijl_alloc_array_3d at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/array.c:450 Array at ./boot.jl:481 [inlined] Array at ./boot.jl:488 [inlined] similar at ./array.jl:420 [inlined] similar at ./abstractarray.jl:828 [inlined] _unsafe_getindex at ./multidimensional.jl:901 _getindex at ./multidimensional.jl:889 [inlined] getindex at ./abstractarray.jl:1288 [inlined] macro expansion at /mnt/d/Documents/Julia/deadlock.jl:9 [inlined]

54#threadsfor_fun#10 at ./threadingconstructs.jl:214

54#threadsfor_fun at ./threadingconstructs.jl:181 [inlined]

1 at ./threadingconstructs.jl:153

unknown function (ip: 0x7f849a3bef52) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jl_apply at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined] start_task at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:1238 unknown function (ip: (nil)) pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173 jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350 _IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) unknown function (ip: 0x7f849a38798c) unknown function (ip: (nil)) pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173 jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350 _IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) copy at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/abstractarray.jl:75 unknown function (ip: (nil))

============================================================== Profile collected. A report will print at the next yield point


</p></details>

In a few cases with WSL, I manage to get a profile out by continuing to send interruption signals. I assume I manage to interrupt a function in particular but really not sure what is going on here, I get something out once in 10 tries I would say.
```julia
==============================================================
Profile collected. A report will print at the next yield point
==============================================================

^C^C^C^C^C^C^C^CWARNING: Force throwing a SIGINT
ERROR: LoadError: InterruptException:
Stacktrace:
  [1] try_yieldto(undo::typeof(Base.ensure_rescheduled))
    @ Base ./task.jl:931
  [2] wait()
    @ Base ./task.jl:995
  [3] wait(c::Base.GenericCondition{Base.Threads.SpinLock}; first::Bool)
    @ Base ./condition.jl:130
  [4] wait
    @ Base ./condition.jl:125 [inlined]
  [5] _wait(t::Task)
    @ Base ./task.jl:310
  [6] ^Cthreading_run(fun::var"#39#threadsfor_fun#8"{var"#39#threadsfor_fun#7#9"{CuArray{ComplexF32, 4, CUDA.Mem.DeviceBuffer}, Array{ComplexF32, 5}, Base.OneTo{Int64}}}, static::Bool)
    @ Base.Threads ./threadingconstructs.jl:166
  [7] macro expansion
    @ ./threadingconstructs.jl:219 [inlined]
  [8] main()
    @ Main /path/to/test_deadlock.jl:7
  [9] top-level scope
    @ /path/to/test_deadlock.jl:15
 [10] include(fname::String)
    @ Base.MainInclude ./client.jl:489
 [11] top-level scope
    @ REPL[2]:1
 [12] top-level scope
    @ /path/to/local/CUDA.jl/src/initialization.jl:206
in expression starting at /path/to/test_deadlock.jl:15
maleadt commented 9 months ago

Can you try the latest version of the PR (which marks all ccalls as gc-safe)?

AaronGhost commented 9 months ago

Thanks. I tried the latest version of the PR and can't make my MWE deadlock on WSL or Windows with julia 1.10 anymore. I tried the latest version to my original code which still deadlocks with 1.10 and finishes normally with 1.9.

I reduced the new version which is very similar to the previous one except for the FFT plan. I added the backtrace obtained from WSL with 8 threads and the new MWE:

using CUDA
using ChunkSplitters

function main()
    data = rand(ComplexF32, (100, 100, 8, 20, 200))
    cu_result = CUDA.zeros(ComplexF32, (100, 100, 20, 200))
    plans = [CUDA.CUFFT.plan_bfft(CUDA.zeros(ComplexF32, (100, 100, 8)), 1:2) for _ in 1:Threads.nthreads()]

    Threads.@threads for (ichunk, chunk) in enumerate(chunks(axes(data, 5); n=Threads.nthreads()))
        for i in chunk
            for t in axes(data, 4)
                cu_result[:, :, t, i] .= sum(plans[ichunk] * CuArray(data[:, :, :, t, i]))
            end
        end
    end
end

println(getpid())
for i in 1:5
    println("Run $i")
    main()
end
Backtrace

``` ====================================================================================== Information request received. A stacktrace will print followed by a 1.0 second profile ====================================================================================== cmd: /home/username/julia-1.10.0/bin/julia 776 running 15 of 15 signal (10): User defined signal 1 pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173 jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350 _IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) multiq_check_empty at ./partr.jl:186 jfptr_multiq_check_empty_75167.1 at /home/username/julia-1.10.0/lib/julia/sys.so (unknown line) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 check_empty at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:340 [inlined] ijl_task_get_next at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:388 poptask at ./task.jl:985 wait at ./task.jl:994 #wait#645 at ./condition.jl:130 wait at ./condition.jl:125 [inlined] take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:53 synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120 unknown function (ip: 0x7fcb27c66c08) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jlcapi_synchronization_worker_13946 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line) start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) unknown function (ip: (nil)) pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173 jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350 _IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) _jl_mutex_unlock at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/threading.c:927 jl_mutex_unlock at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_locks.h:80 [inlined] ijl_process_events at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/jl_uv.c:286 ijl_task_get_next at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:524 poptask at ./task.jl:985 wait at ./task.jl:994 #wait#645 at ./condition.jl:130 wait at ./condition.jl:125 [inlined] take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:53 synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120 unknown function (ip: 0x7fcb27c66c08) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jlcapi_synchronization_worker_13946 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line) start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) unknown function (ip: (nil)) pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173 jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350 _IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) jl_gc_state_set at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_threads.h:351 [inlined] jl_gc_state_set at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_threads.h:344 [inlined] jl_gc_safe_leave at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/jlapi.c:465macro expansion at /path/to/local/CUDA.jl/lib/utils/call.jl:204 [inlined] unchecked_cuStreamSynchronize at /path/to/local/CUDA.jl/lib/cudadrv/libcuda.jl:4023 [inlined] #920 at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:126 [inlined] take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:56 synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120 unknown function (ip: 0x7fcb27c66c08) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jlcapi_synchronization_worker_13946 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line) start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) unknown function (ip: (nil)) pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173 jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350 _IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) ijl_process_events at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/jl_uv.c:277 ijl_task_get_next at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:524 poptask at ./task.jl:985 wait at ./task.jl:994 #wait#645 at ./condition.jl:130 wait at ./condition.jl:125 [inlined] take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:53 synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120 unknown function (ip: 0x7fcb27c66c08) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jlcapi_synchronization_worker_13946 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line) start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) unknown function (ip: (nil)) pthread_rwlock_wrlock at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) unknown function (ip: 0x7fcaee2f131b) unknown function (ip: 0x7fcaedfe2021) unknown function (ip: 0x7fcaee0e4546) macro expansion at /path/to/local/CUDA.jl/lib/utils/call.jl:203 [inlined] macro expansion at /path/to/local/CUDA.jl/lib/cudadrv/libcuda.jl:4848 [inlined] #705 at /path/to/local/CUDA.jl/lib/utils/call.jl:30 check at /path/to/local/CUDA.jl/lib/cudadrv/libcuda.jl:32 [inlined] cuOccupancyMaxPotentialBlockSize at /path/to/local/CUDA.jl/lib/utils/call.jl:29 unknown function (ip: 0x7fcb27ca9802) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 #launch_configuration#901 at /path/to/local/CUDA.jl/lib/cudadrv/occupancy.jl:75 launch_configuration at /path/to/local/CUDA.jl/lib/cudadrv/occupancy.jl:60 [inlined] #mapreducedim!#1160 at /path/to/local/CUDA.jl/src/mapreduce.jl:236 mapreducedim! at /path/to/local/CUDA.jl/src/mapreduce.jl:169 [inlined] #mapreducedim!#1160 at /path/to/local/CUDA.jl/src/mapreduce.jl:274 mapreducedim! at /path/to/local/CUDA.jl/src/mapreduce.jl:169 [inlined] #_mapreduce#43 at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/mapreduce.jl:67 _mapreduce at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/mapreduce.jl:33 [inlined] #mapreduce#41 at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/mapreduce.jl:28 [inlined]mapreduce at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/mapreduce.jl:28 [inlined] #_sum#831 at ./reducedim.jl:1015 [inlined] _sum at ./reducedim.jl:1015 [inlined] #_sum#830 at ./reducedim.jl:1014 [inlined] _sum at ./reducedim.jl:1014 [inlined] #sum#828 at ./reducedim.jl:1010 [inlined] sum at ./reducedim.jl:1010 [inlined] macro expansion at /local/directory/deadlock.jl:12 [inlined] #2#threadsfor_fun#2 at ./threadingconstructs.jl:214 #2#threadsfor_fun at ./threadingconstructs.jl:181 [inlined] #1 at ./threadingconstructs.jl:153 unknown function (ip: 0x7fcb27c84562) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jl_apply at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined] start_task at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:1238 unknown function (ip: (nil)) pthread_rwlock_wrlock at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) unknown function (ip: 0x7fcaee2f131b) unknown function (ip: 0x7fcaedffca08) unknown function (ip: 0x7fcaedffd5eb) unknown function (ip: 0x7fcaee110946) unknown function (ip: 0x7fca7440b04c) unknown function (ip: 0x7fca744302de) unknown function (ip: 0x7fca7443a879) unknown function (ip: 0x7fca744272d9) unknown function (ip: 0x7fca7442b6bc) unknown function (ip: 0x7fca74455bda) unknown function (ip: 0x7fca74408fe9) cufftXtExecDescriptor at /home/username/.julia/artifacts/c0e6b8fff2621303ace1cc360b7fca676b4e28fd/lib/libcufft.so (unknown line) cufftXtExec at /home/username/.julia/artifacts/c0e6b8fff2621303ace1cc360b7fca676b4e28fd/lib/libcufft.so (unknown line) macro expansion at /path/to/local/CUDA.jl/lib/cufft/libcufft.jl:229 [inlined] #46 at /path/to/local/CUDA.jl/lib/utils/call.jl:30 [inlined] retry_reclaim at /path/to/local/CUDA.jl/src/pool.jl:370 [inlined] check at /path/to/local/CUDA.jl/lib/cufft/libcufft.jl:18 [inlined] cufftExecC2C at /path/to/local/CUDA.jl/lib/utils/call.jl:29 unsafe_execute! at /path/to/local/CUDA.jl/lib/cufft/fft.jl:332 unsafe_execute_trailing! at /path/to/local/CUDA.jl/lib/cufft/fft.jl:401 * at /path/to/local/CUDA.jl/lib/cufft/fft.jl:455 [inlined] macro expansion at /local/directory/deadlock.jl:12 [inlined] #2#threadsfor_fun#2 at ./threadingconstructs.jl:214 #2#threadsfor_fun at ./threadingconstructs.jl:181 [inlined] #1 at ./threadingconstructs.jl:153 unknown function (ip: 0x7fcb27c84562) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jl_apply at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined] start_task at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:1238 unknown function (ip: (nil)) pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173 jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350 _IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) #189 at /path/to/local/CUDA.jl/lib/utils/call.jl:30 unknown function (ip: (nil)) _mm_pause at /usr/local/lib/gcc/x86_64-linux-gnu/9.1.0/include/xmmintrin.h:1271 [inlined] jl_gc_wait_for_the_world at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:242 [inlined] ijl_gc_collect at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:3502 maybe_collect at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:937 [inlined] jl_gc_pool_alloc_inner at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:1293 [inlined] jl_gc_pool_alloc_noinline at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:1350 jl_gc_alloc_ at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:477 [inlined] _new_array_ at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/array.c:144 [inlined] _new_array at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/array.c:198 [inlined] ijl_alloc_array_3d at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/array.c:450 Array at ./boot.jl:481 [inlined] Array at ./boot.jl:488 [inlined] similar at ./array.jl:420 [inlined] similar at ./abstractarray.jl:828 [inlined] _unsafe_getindex at ./multidimensional.jl:901 _getindex at ./multidimensional.jl:889 [inlined] getindex at ./abstractarray.jl:1288 [inlined] macro expansion at /local/directory/deadlock.jl:12 [inlined] #2#threadsfor_fun#2 at ./threadingconstructs.jl:214 #2#threadsfor_fun at ./threadingconstructs.jl:181 [inlined] #1 at ./threadingconstructs.jl:153 unknown function (ip: 0x7fcb27c84562) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jl_apply at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined] start_task at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:1238 unknown function (ip: (nil)) pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173 jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350 _IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) unknown function (ip: 0x7fcb27c7e9dc) unknown function (ip: (nil)) pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173 jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350 _IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) ijl_process_events at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/jl_uv.c:277 unknown function (ip: (nil)) pthread_rwlock_wrlock at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) unknown function (ip: 0x7fcaee2f131b) unknown function (ip: 0x7fcaedffca08) unknown function (ip: 0x7fcaedffd5eb) unknown function (ip: 0x7fcaee110946) unknown function (ip: 0x7fca7440b04c) unknown function (ip: 0x7fca744302de) unknown function (ip: 0x7fca7443a879) unknown function (ip: 0x7fca744272d9) unknown function (ip: 0x7fca7442b6bc) unknown function (ip: 0x7fca74455bda) unknown function (ip: 0x7fca74408fe9) cufftXtExecDescriptor at /home/username/.julia/artifacts/c0e6b8fff2621303ace1cc360b7fca676b4e28fd/lib/libcufft.so (unknown line) cufftXtExec at /home/username/.julia/artifacts/c0e6b8fff2621303ace1cc360b7fca676b4e28fd/lib/libcufft.so (unknown line) macro expansion at /path/to/local/CUDA.jl/lib/cufft/libcufft.jl:229 [inlined] #46 at /path/to/local/CUDA.jl/lib/utils/call.jl:30 [inlined] retry_reclaim at /path/to/local/CUDA.jl/src/pool.jl:370 [inlined] check at /path/to/local/CUDA.jl/lib/cufft/libcufft.jl:18 [inlined] cufftExecC2C at /path/to/local/CUDA.jl/lib/utils/call.jl:29 unsafe_execute! at /path/to/local/CUDA.jl/lib/cufft/fft.jl:332 unsafe_execute_trailing! at /path/to/local/CUDA.jl/lib/cufft/fft.jl:401 * at /path/to/local/CUDA.jl/lib/cufft/fft.jl:455 [inlined] macro expansion at /local/directory/deadlock.jl:12 [inlined] #2#threadsfor_fun#2 at ./threadingconstructs.jl:214 #2#threadsfor_fun at ./threadingconstructs.jl:181 [inlined] #1 at ./threadingconstructs.jl:153 unknown function (ip: 0x7fcb27c84562) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jl_apply at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined] start_task at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:1238 unknown function (ip: (nil)) pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883 jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173 jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined] segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350 _IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line) jl_gc_state_set at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_threads.h:351 [inlined] jl_gc_state_set at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_threads.h:344 [inlined] jl_gc_safe_leave at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/jlapi.c:465unknown function (ip: (nil)) ============================================================== Profile collected. A report will print at the next yield point ============================================================== ```

maleadt commented 9 months ago

I extended the PR to cover all libraries, i.e., including cuFFT. Can you test again?

AaronGhost commented 9 months ago

Problem fixed with the latest version! Thank you very much!