Closed christiangnrd closed 1 month ago
https://buildkite.com/julialang/metal-dot-jl/builds/1187#01919a97-f080-4e37-8f60-7422426ea8ef is curious, as the metal
test suite doesn't launch any kernels. So maybe this isn't due to bad codegen?
I believe the offending commit is https://github.com/JuliaLang/julia/commit/5f36833b71e8bcce6ab9c89e8e98246638e465a1 from https://github.com/JuliaLang/julia/pull/50144.
I was able to reliably reproduce the hang by running
for _ in 1:100
include("test/metal.jl")
end
in two different terminals simultaneously. (Both 1.11)
It always hangs at the second wait_completed
https://github.com/JuliaGPU/Metal.jl/blob/04b5481f58f393538ed5efac1408c5ee98c616b9/test/metal.jl#L317 of the testset.
I don't know if the offending Julia commit is the cause of the bug or if it just uncovered it, but hopefully this is a good starting point for someone with a better understanding of this part of the codebase to investigate this further.
This might also help? Collected a sample of the hung process from activity monitor.
MWE:
using Metal, .MTL
dev = first(devices())
cmdq = MTLCommandQueue(dev)
cmdbuf = MTLCommandBuffer(cmdq)
scheduled = Ref(false)
completed = Ref(false)
on_scheduled(cmdbuf) do buf
GC.gc(true)
scheduled[] = true
end
on_completed(cmdbuf) do buf
GC.gc(true)
completed[] = true
end
@assert scheduled[] == false
@assert completed[] == false
@assert cmdbuf.status == MTL.MTLCommandBufferStatusNotEnqueued
enqueue!(cmdbuf)
@assert cmdbuf.status == MTL.MTLCommandBufferStatusEnqueued
commit!(cmdbuf)
print("start... ")
wait_completed(cmdbuf)
println("stop")
Hangs on 1.10 and 1.12 on the first iteration.
Now that 1.11 compilations are fixed, we're back to sporadic hanging of CI.
See example 1, example 2, example 3.
Bonus local output
``` (Metal) pkg> test Testing Metal Status `/private/var/folders/4g/lnkpkf3s4rxd_wbl8vwnqs4r0000gn/T/jl_aiC8B7/Project.toml` [79e6a3ab] Adapt v4.0.4 [ab4f0b2a] BFloat16s v0.5.0 [6e4b80f9] BenchmarkTools v1.5.0 [0c68f7d7] GPUArrays v10.3.1 [63c18a36] KernelAbstractions v0.9.25 [929cbde3] LLVM v9.0.0 [dde4c033] Metal v1.3.0 `~/.julia/dev/Metal` [e86c9b32] ObjectiveC v3.0.0 [276daf66] SpecialFunctions v2.4.0 [90137ffa] StaticArrays v1.9.7 [10745b16] Statistics v1.11.1 [ade2ca70] Dates v1.11.0 [8ba89e20] Distributed v1.11.0 [b77e0a4c] InteractiveUtils v1.11.0 [37e2e46d] LinearAlgebra v1.11.0 [de0858da] Printf v1.11.0 [3fa0cd96] REPL v1.11.0 [9a3f8284] Random v1.11.0 [2f01184e] SparseArrays v1.11.0 [8dfed614] Test v1.11.0 Status `/private/var/folders/4g/lnkpkf3s4rxd_wbl8vwnqs4r0000gn/T/jl_aiC8B7/Manifest.toml` [79e6a3ab] Adapt v4.0.4 [a9b6321e] Atomix v0.1.0 [ab4f0b2a] BFloat16s v0.5.0 [6e4b80f9] BenchmarkTools v1.5.0 [fa961155] CEnum v0.5.0 [523fee87] CodecBzip2 v0.8.4 [ffbed154] DocStringExtensions v0.9.3 [e2ba6199] ExprTools v0.1.10 [0c68f7d7] GPUArrays v10.3.1 [46192b85] GPUArraysCore v0.1.6 [61eb1bfa] GPUCompiler v0.27.4 [92d709cd] IrrationalConstants v0.2.2 [692b3bcd] JLLWrappers v1.5.0 [682c06a0] JSON v0.21.4 [63c18a36] KernelAbstractions v0.9.25 [929cbde3] LLVM v9.0.0 [2ab3a3ac] LogExpFunctions v0.3.28 [1914dd2f] MacroTools v0.5.13 [dde4c033] Metal v1.3.0 `~/.julia/dev/Metal` [d8793406] ObjectFile v0.4.2 [e86c9b32] ObjectiveC v3.0.0 [69de0a69] Parsers v2.8.1 [aea7be01] PrecompileTools v1.2.1 [21216c6a] Preferences v1.4.3 [189a3867] Reexport v1.2.2 [ae029012] Requires v1.3.0 [6c6a2e73] Scratch v1.2.1 [276daf66] SpecialFunctions v2.4.0 [90137ffa] StaticArrays v1.9.7 [1e83bf80] StaticArraysCore v1.4.3 [10745b16] Statistics v1.11.1 [53d494c1] StructIO v0.3.1 [a759f4b9] TimerOutputs v0.5.24 [3bb67fe8] TranscodingStreams v0.11.2 [013be700] UnsafeAtomics v0.2.1 [d80eeb9a] UnsafeAtomicsLLVM v0.2.1 [6e34b625] Bzip2_jll v1.0.8+1 [2e619515] Expat_jll v2.6.2+0 [f52de702] LLVMDowngrader_jll v0.3.0+1 [dad2f222] LLVMExtra_jll v0.0.33+0 [7106de7a] LibMPDec_jll v2.5.1+0 ⌅ [e9f186c6] Libffi_jll v3.2.2+1 [458c3c95] OpenSSL_jll v3.0.14+0 [efe28fd5] OpenSpecFun_jll v0.5.5+0 [93d3a430] Python_jll v3.10.14+0 [76ed43ae] SQLite_jll v3.45.3+0 [ffd25f8a] XZ_jll v5.4.6+0 [0dad84c5] ArgTools v1.1.2 [56f22d72] Artifacts v1.11.0 [2a0f44e3] Base64 v1.11.0 [ade2ca70] Dates v1.11.0 [8ba89e20] Distributed v1.11.0 [f43a241f] Downloads v1.6.0 [7b1f6079] FileWatching v1.11.0 [b77e0a4c] InteractiveUtils v1.11.0 [4af54fe1] LazyArtifacts v1.11.0 [b27032c2] LibCURL v0.6.4 [76f85450] LibGit2 v1.11.0 [8f399da3] Libdl v1.11.0 [37e2e46d] LinearAlgebra v1.11.0 [56ddb016] Logging v1.11.0 [d6f4376e] Markdown v1.11.0 [a63ad114] Mmap v1.11.0 [ca575930] NetworkOptions v1.2.0 [44cfe95a] Pkg v1.11.0 [de0858da] Printf v1.11.0 [9abbd945] Profile v1.11.0 [3fa0cd96] REPL v1.11.0 [9a3f8284] Random v1.11.0 [ea8e919c] SHA v0.7.0 [9e88b42a] Serialization v1.11.0 [6462fe0b] Sockets v1.11.0 [2f01184e] SparseArrays v1.11.0 [f489334b] StyledStrings v1.11.0 [fa267f1f] TOML v1.0.3 [a4e569a6] Tar v1.10.0 [8dfed614] Test v1.11.0 [cf7118a7] UUIDs v1.11.0 [4ec0a83e] Unicode v1.11.0 [e66e0078] CompilerSupportLibraries_jll v1.1.1+0 [deac9b47] LibCURL_jll v8.6.0+0 [e37daf67] LibGit2_jll v1.7.2+0 [29816b5a] LibSSH2_jll v1.11.0+1 [c8ffd9c3] MbedTLS_jll v2.28.6+0 [14a3606d] MozillaCACerts_jll v2023.12.12 [4536629a] OpenBLAS_jll v0.3.27+1 [05823500] OpenLibm_jll v0.8.1+2 [bea87d4a] SuiteSparse_jll v7.7.0+0 [83775a58] Zlib_jll v1.2.13+1 [8e850b90] libblastrampoline_jll v5.11.0+0 [8e850ede] nghttp2_jll v1.59.0+0 [3f19e933] p7zip_jll v17.4.0+2 Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. Testing Running tests... ┌ Info: System information: │ macOS 14.6.1, Darwin 23.6.0 │ │ Toolchain: │ - Julia: 1.11.0-rc3 │ - LLVM: 16.0.6 │ │ Julia packages: │ - Metal.jl: 1.3.0 │ - LLVMDowngrader_jll: 0.3.0+1 │ │ 1 device: └ - Apple M2 Max (64.000 KiB allocated) [ Info: Running 8 tests in parallel. If this is too many, specify the `--jobs` argument to the tests, or set the JULIA_CPU_THREADS environment variable. | | ---------------- CPU ---------------- | Test (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB) | metallib (8) | 0.62 | 0.01 | 2.0 | 214.88 | 547.77 | pool (9) | 1.14 | 0.03 | 2.5 | 325.48 | 595.09 | From worker 8: Starting recording with the Blank template and GPU, Time Profiler, Metal Application, Metal GPU Counters, Metal Resource Events, os_signpost Instruments. Attaching to: julia (23790). From worker 8: Ctrl-C to stop the recording From worker 8: Stopping recording... metal (7) | 4.59 | 0.17 | 3.8 | 930.59 | 813.38 | From worker 7: ┌ Warning: Skipping script tests From worker 7: └ @ Main ~/.julia/dev/Metal/test/scripts.jl:9 scripts (7) | 0.55 | 0.00 | 0.0 | 78.42 | 822.00 | From worker 10: 2024-08-29 10:43:58.916 julia[23792:284856] Metal GPU Frame Capture Enabled From worker 8: Recording completed. Saving output file... From worker 8: Output file saved as: julia_1.trace From worker 8: [ Info: System trace saved to /private/var/folders/4g/lnkpkf3s4rxd_wbl8vwnqs4r0000gn/T/jl_pNVkk4/julia_1.trace; open the resulting trace in Instruments profiling (8) | 9.36 | 0.01 | 0.1 | 114.46 | 570.70 | From worker 10: [ Info: GPU frame capture saved to /private/var/folders/4g/lnkpkf3s4rxd_wbl8vwnqs4r0000gn/T/jl_UfApxE/julia_1.gputrace; open the resulting trace in Xcode From worker 10: [ Info: GPU frame capture saved to /private/var/folders/4g/lnkpkf3s4rxd_wbl8vwnqs4r0000gn/T/jl_UfApxE/julia_2.gputrace; open the resulting trace in Xcode capturing (10) | 9.87 | 0.18 | 1.8 | 1080.20 | 719.06 | execution (5) | 18.02 | 0.28 | 1.6 | 2198.88 | 766.20 | mps/matrix (11) | 0.58 | 0.00 | 0.0 | 82.76 | 556.66 | mps/size (11) | 0.04 | 0.00 | 0.0 | 1.41 | 564.39 | examples (4) | 23.70 | 0.57 | 2.4 | 2662.96 | 2038.67 | mps/vector (11) | 0.13 | 0.00 | 0.0 | 12.66 | 565.38 | random (9) | 24.50 | 0.40 | 1.6 | 3500.97 | 978.94 | gpuarrays/indexing scalar (4) | 6.65 | 0.08 | 1.2 | 1091.20 | 2089.61 | kernelabstractions (6) | 31.11 | 0.65 | 2.1 | 4947.82 | 1138.53 | mps/linalg (5) | 16.84 | 0.35 | 2.1 | 3096.05 | 976.09 | mps/copy (8) | 33.03 | 0.75 | 2.3 | 6544.86 | 989.70 | array (2) | 45.54 | 0.71 | 1.6 | 6176.39 | 1094.27 | gpuarrays/interface (2) | 0.94 | 0.01 | 1.4 | 236.47 | 1142.72 | gpuarrays/indexing find (5) | 15.72 | 0.42 | 2.7 | 5267.93 | 1181.33 | gpuarrays/math/power (4) | 20.75 | 0.48 | 2.3 | 4750.50 | 2305.39 | gpuarrays/reductions/any all count (2) | 8.32 | 0.12 | 1.5 | 1691.97 | 1204.00 | gpuarrays/uniformscaling (4) | 4.67 | 0.02 | 0.5 | 502.92 | 2437.33 | gpuarrays/linalg/mul!/vector-matrix (6) | 28.20 | 0.41 | 1.5 | 5208.05 | 1341.78 | gpuarrays/math/intrinsics (6) | 0.91 | 0.00 | 0.0 | 107.54 | 1347.39 | gpuarrays/indexing multidimensional (8) | 25.80 | 0.45 | 1.7 | 5203.15 | 1279.70 | gpuarrays/reductions/mapreducedim!_large (2) | 27.58 | 0.79 | 2.9 | 6800.30 | 1778.45 | gpuarrays/linalg/norm (6) | 28.24 | 0.54 | 1.9 | 6401.18 | 1584.08 | gpuarrays/reductions/reducedim! (11) | 67.84 | 1.30 | 1.9 | 12473.56 | 1249.77 | gpuarrays/vectors (11) | 0.18 | 0.00 | 0.0 | 27.23 | 1252.45 | gpuarrays/linalg/mul!/matrix-matrix (4) | 38.05 | 0.45 | 1.2 | 5872.34 | 2530.66 | gpuarrays/statistics (8) | 26.77 | 0.40 | 1.5 | 4574.18 | 1491.39 | gpuarrays/linalg (9) | 73.68 | 1.55 | 2.1 | 14299.43 | 1578.47 | gpuarrays/random (11) | 8.23 | 0.08 | 1.0 | 1146.98 | 1366.86 | gpuarrays/constructors (6) | 16.62 | 0.15 | 0.9 | 1804.54 | 1701.91 | gpuarrays/base (4) | 17.28 | 0.44 | 2.6 | 3418.14 | 2573.62 | gpuarrays/reductions/== isequal (8) | 25.87 | 0.53 | 2.1 | 5815.18 | 1610.95 | gpuarrays/reductions/mapreducedim! (11) | 26.83 | 0.43 | 1.6 | 5249.56 | 1554.47 | gpuarrays/reductions/minimum maximum extrema (5) | 105.34 | 2.37 | 2.2 | 22907.86 | 1966.19 | gpuarrays/reductions/reduce (6) | 56.45 | 1.32 | 2.3 | 13019.02 | 2042.95 | gpuarrays/reductions/mapreduce (2) | 81.31 | 1.81 | 2.2 | 17933.78 | 2046.83 | gpuarrays/reductions/sum prod (4) | 77.72 | 1.54 | 2.0 | 16463.69 | 3235.77 | gpuarrays/broadcasting (9) | 108.09 | 1.93 | 1.8 | 19730.51 | 2196.94 | ┌ Warning: Test timed out: device/intrinsics └ @ Main ~/.julia/dev/Metal/test/runtests.jl:247 From worker 7: From worker 7: ====================================================================================== From worker 7: Information request received. A stacktrace will print followed by a 1.0 second profile From worker 7: ====================================================================================== From worker 7: From worker 7: signal (29): Information request: 29 From worker 7: __psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line) From worker 7: unknown function (ip: 0x0) From worker 7: __psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line) From worker 7: unknown function (ip: 0x0) From worker 7: From worker 7: ============================================================== From worker 7: Profile collected. A report will print at the next yield point From worker 7: ============================================================== From worker 7: ┌ Warning: Couldn't kill worker 7, killing process 23789 forcefully └ @ Main ~/.julia/dev/Metal/test/runtests.jl:256 From worker 7: From worker 7: [23789] signal 15: Terminated: 15 From worker 7: in expression starting at /Users/christian/.julia/dev/Metal/test/device/intrinsics.jl:4 From worker 7: __psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line) From worker 7: unknown function (ip: 0x0) From worker 7: __psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line) From worker 7: unknown function (ip: 0x0) From worker 7: Allocations: 35316542 (Pool: 35315514; Big: 1028); GC: 24 device/intrinsics (7) | failed at 2024-08-29T10:52:12.607 Testing finished in 8 minutes, 23 seconds, 33 milliseconds device/intrinsics: Error During Test at none:1 Got exception outside of a @test ProcessExitedException(7) Test Summary: | Pass Error Broken Total Time Overall | 9617 1 9 9627 metallib | 25 25 pool | 5 5 metal | 131 131 scripts | 0 profiling | 1 1 capturing | 24 24 execution | 35 35 mps/matrix | 75 75 mps/size | 9 9 examples | 4 4 mps/vector | 34 34 random | 817 1 818 gpuarrays/indexing scalar | 399 399 kernelabstractions | 2179 8 2187 mps/linalg | 45 45 mps/copy | 144 144 array | 374 374 gpuarrays/interface | 7 7 gpuarrays/indexing find | 45 45 gpuarrays/math/power | 60 60 gpuarrays/reductions/any all count | 101 101 gpuarrays/uniformscaling | 56 56 gpuarrays/linalg/mul!/vector-matrix | 140 140 gpuarrays/math/intrinsics | 10 10 gpuarrays/indexing multidimensional | 89 89 gpuarrays/reductions/mapreducedim!_large | 40 40 gpuarrays/linalg/norm | 264 264 gpuarrays/reductions/reducedim! | 160 160 gpuarrays/vectors | 10 10 gpuarrays/linalg/mul!/matrix-matrix | 360 360 gpuarrays/statistics | 52 52 gpuarrays/linalg | 397 397 gpuarrays/random | 52 52 gpuarrays/constructors | 832 832 gpuarrays/base | 93 93 gpuarrays/reductions/== isequal | 248 248 gpuarrays/reductions/mapreducedim! | 260 260 gpuarrays/reductions/minimum maximum extrema | 555 555 gpuarrays/reductions/reduce | 220 220 gpuarrays/reductions/mapreduce | 330 330 gpuarrays/reductions/sum prod | 636 636 gpuarrays/broadcasting | 299 299 device/intrinsics | 1 1 FAILURE Error in testset device/intrinsics: Error During Test at none:1 Got exception outside of a @test ProcessExitedException(7) ERROR: LoadError: Test run finished with errors in expression starting at /Users/christian/.julia/dev/Metal/test/runtests.jl:410 ERROR: Package Metal errored during testing ```Maybe related to #329?