JuliaGPU / Metal.jl

Metal programming in Julia
MIT License
359 stars 40 forks source link

Command buffer callbacks can cause bus error during thread adoption #138

Closed christiangnrd closed 1 year ago

christiangnrd commented 1 year ago

I encountered an error while running tests for #132 in gpuarrays/linalg that did not appear in the next run. I'm pretty sure this isn't my first time running into this bug, but I never paid attention to it because it would go away after rerunning.

It happened on M2 Max running the 1.9.0-rc1 tests for #126 after rebasing on main.

This is all I can really say about it for now. I'll add more details if I encounter it in different circumstances.

Error output without the expected errors that will be fixed by #136:

``` Testing Running tests... ┌ Info: System information: │ macOS 13.2.1, Darwin 21.4.0 │ │ Toolchain: │ - Julia: 1.9.0-rc1 │ - LLVM: 14.0.6 │ │ 1 device: └ - Apple M2 Max (64.000 KiB allocated) ┌ Info: Using Metal LLVM back-end from /Users/christian/.julia/artifacts/3c74b0072cc694992a9d90b5778fb28f7ec53251/bin: │ LLVM (http://llvm.org/): │ LLVM version 14.0.0 │ Optimized build. │ Default target: aarch64-apple-darwin22.3.0 └ Host CPU: cyclone [ Info: Running 8 tests in parallel. If this is too many, specify the `--jobs` argument to the tests, or set the JULIA_CPU_THREADS environment variable. | | ---------------- CPU ---------------- | Test (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB) | From worker 2: ┌ Warning: Metal does not support Float64 values, try using Float32 instead From worker 2: └ @ Metal ~/.julia/dev/Metal/src/array.jl:38 metal (5) | 1.62 | 0.03 | 1.8 | 226.30 | 480.25 | mps (6) | 2.50 | 0.06 | 2.2 | 439.30 | 475.64 | From worker 10: 2023-03-17 11:40:26.883 julia[29514:132830] Metal GPU Frame Capture Enabled execution (4) | 7.58 | 0.23 | 3.0 | 1272.39 | 589.84 | From worker 10: [ Info: GPU frame capture saved to /private/var/folders/4g/lnkpkf3s4rxd_wbl8vwnqs4r0000gn/T/jl_hyw58a/test.gputrace/julia_capture_1.gputrace/ profiling (10) | 6.43 | 0.21 | 3.3 | 969.57 | 558.34 | gpuarrays/indexing scalar (9) | 11.29 | 0.35 | 3.1 | 1880.50 | 624.77 | array (2) | 15.90 | 0.53 | 3.3 | 2776.12 | 756.33 | device/intrinsics (8) | 17.77 | 0.59 | 3.3 | 3267.70 | 679.89 | gpuarrays/interface (8) | 1.63 | 0.10 | 6.0 | 356.34 | 687.48 | gpuarrays/indexing multidimensional (2) | 11.87 | 0.43 | 3.6 | 2322.29 | 825.42 | gpuarrays/math/power (4) | 20.37 | 1.24 | 6.1 | 3803.57 | 891.00 | gpuarrays/reductions/any all count (8) | 9.85 | 0.47 | 4.8 | 2528.00 | 738.88 | gpuarrays/uniformscaling (4) | 3.98 | 0.06 | 1.6 | 524.28 | 912.48 | examples (3) | 37.07 | 0.00 | 0.0 | 11.03 | 447.86 | gpuarrays/indexing find (11) | 23.50 | 1.40 | 5.9 | 5013.99 | 856.53 | gpuarrays/linalg/mul!/vector-matrix (9) | 31.28 | 0.90 | 2.9 | 5702.68 | 848.75 | gpuarrays/math/intrinsics (3) | 8.44 | 0.29 | 3.4 | 1333.39 | 618.14 | gpuarrays/reductions/mapreducedim!_large (8) | 32.91 | 0.91 | 2.8 | 8352.05 | 1650.42 | gpuarrays/linalg (6) | failed at 2023-03-17T11:41:32.879 gpuarrays/statistics (9) | 31.01 | 1.87 | 6.0 | 7226.16 | 1141.23 | gpuarrays/reductions/reducedim! (5) | 72.54 | 2.62 | 3.6 | 13272.53 | 1164.95 | gpuarrays/linalg/mul!/matrix-matrix (4) | 42.65 | 1.03 | 2.4 | 6976.63 | 1180.75 | gpuarrays/constructors (8) | 13.78 | 0.49 | 3.6 | 2271.98 | 1779.42 | gpuarrays/linalg/norm (11) | 40.44 | 2.14 | 5.3 | 7487.01 | 1127.58 | gpuarrays/base (9) | 15.35 | 0.99 | 6.5 | 3650.37 | 1309.75 | gpuarrays/random (12) | 15.21 | 0.56 | 3.7 | 2860.24 | 688.80 | gpuarrays/reductions/== isequal (5) | failed at 2023-03-17T11:42:23.920 gpuarrays/reductions/mapreducedim! (8) | 73.59 | 3.27 | 4.4 | 15854.44 | 2294.77 | gpuarrays/reductions/minimum maximum extrema (2) | 130.63 | 6.99 | 5.4 | 25798.04 | 1888.75 | gpuarrays/reductions/mapreduce (3) | 130.60 | 5.88 | 4.5 | 31158.60 | 1454.33 | gpuarrays/reductions/sum prod (9) | 90.87 | 4.67 | 5.1 | 18240.08 | 1879.81 | gpuarrays/broadcasting (4) | 110.91 | 6.51 | 5.9 | 20943.60 | 1871.67 | gpuarrays/reductions/reduce (11) | 109.63 | 4.84 | 4.4 | 19502.77 | 1723.31 | Testing finished in 3 minutes, 12 seconds, 526 milliseconds Worker 6 failed running test gpuarrays/linalg: Some tests did not pass: 232 passed, 1 failed, 0 errored, 0 broken. gpuarrays/linalg: Test Failed at /Users/christian/.julia/packages/GPUArrays/7TiO1/test/testsuite/linalg.jl:32 Expression: let x = rand(Float32, 4, [2 for _ = 2:18]...) pm = (18:-1:1...,) y = permutedims(x, pm) Array(GPUArrays._permutedims!(UInt64, AT(zero(y)), AT(x), pm)) ≈ y end Stacktrace: [1] backtrace() @ Base ./error.jl:114 [2] record(ts::Test.DefaultTestSet, t::Union{Test.Error, Test.Fail}) @ Test ~/.julia/juliaup/julia-1.9.0-rc1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.9/Test/src/Test.jl:1041 [3] top-level scope @ ~/.julia/dev/Metal/test/runtests.jl:363 [4] include(fname::String) @ Base.MainInclude ./client.jl:478 [5] top-level scope @ none:6 [6] eval @ ./boot.jl:370 [inlined] [7] exec_options(opts::Base.JLOptions) @ Base ./client.jl:280 [8] _start() @ Base ./client.jl:522 Worker 5 failed running test gpuarrays/reductions/== isequal: Some tests did not pass: 242 passed, 0 failed, 6 errored, 0 broken. ### Start mapreduce threads errors . . . ### End mapreduce threads errors Test Summary: | Pass Fail Error Total Time Overall | 5823 1 6 5830 metal | 127 127 mps | 5 5 execution | 17 17 profiling | 22 22 gpuarrays/indexing scalar | 398 398 array | 190 190 device/intrinsics | 25 25 gpuarrays/interface | 7 7 gpuarrays/indexing multidimensional | 42 42 gpuarrays/math/power | 60 60 gpuarrays/reductions/any all count | 101 101 gpuarrays/uniformscaling | 56 56 examples | 3 3 gpuarrays/indexing find | 45 45 gpuarrays/linalg/mul!/vector-matrix | 140 140 gpuarrays/math/intrinsics | 10 10 gpuarrays/reductions/mapreducedim!_large | 40 40 gpuarrays/linalg | 232 1 233 gpuarrays/statistics | 52 52 gpuarrays/reductions/reducedim! | 160 160 gpuarrays/linalg/mul!/matrix-matrix | 360 360 gpuarrays/constructors | 770 770 gpuarrays/linalg/norm | 264 264 gpuarrays/base | 73 73 gpuarrays/random | 50 50 gpuarrays/reductions/== isequal | 242 6 248 gpuarrays/reductions/mapreducedim! | 260 260 gpuarrays/reductions/minimum maximum extrema | 555 555 gpuarrays/reductions/mapreduce | 330 330 gpuarrays/reductions/sum prod | 636 636 gpuarrays/broadcasting | 331 331 gpuarrays/reductions/reduce | 220 220 FAILURE Error in testset gpuarrays/linalg: Test Failed at /Users/christian/.julia/packages/GPUArrays/7TiO1/test/testsuite/linalg.jl:32 Expression: let x = rand(Float32, 4, [2 for _ = 2:18]...) pm = (18:-1:1...,) y = permutedims(x, pm) Array(GPUArrays._permutedims!(UInt64, AT(zero(y)), AT(x), pm)) ≈ y end Error in testset gpuarrays/reductions/== isequal: ### The rest is more mapreduce thread errors ```

christiangnrd commented 1 year ago

I was going to create a new issue for the test failure discussed here, but I checked and it's also in gpuarrays/linalg so I'll add the error here and I can create a new issue if we determine it's a separate issue.

This time it was an error, and it happened while I was running the tests on 1.9.0-rc1 for #136.

Error:

``` From worker 4: ERROR: Exception handler triggered on unmanaged thread. From worker 4: From worker 4: [1648] signal (10.1): Bus error: 10 From worker 4: in expression starting at none:1 From worker 4: unknown function (ip: 0x12a25c218) From worker 4: MTLDispatchListApply at /System/Library/Frameworks/Metal.framework/Versions/A/Metal (unknown line) From worker 4: Allocations: 106560328 (Pool: 106485980; Big: 74348); GC: 161 From worker 4: ERROR: Exception handler triggered on unmanaged thread. gpuarrays/linalg (4) | failed at 2023-03-17T10:16:38.823 Worker 4 terminated. Unhandled Task ERROR: EOFError: read end of file Stacktrace: [1] (::Base.var"#wait_locked#715")(s::Sockets.TCPSocket, buf::IOBuffer, nb::Int64) @ Base ./stream.jl:947 [2] unsafe_read(s::Sockets.TCPSocket, p::Ptr{UInt8}, nb::UInt64) @ Base ./stream.jl:955 [3] unsafe_read @ ./io.jl:761 [inlined] [4] unsafe_read(s::Sockets.TCPSocket, p::Base.RefValue{NTuple{4, Int64}}, n::Int64) @ Base ./io.jl:760 [5] read! @ ./io.jl:762 [inlined] [6] deserialize_hdr_raw @ ~/.julia/juliaup/julia-1.9.0-rc1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.9/Distributed/src/messages.jl:167 [inlined] [7] message_handler_loop(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool) @ Distributed ~/.julia/juliaup/julia-1.9.0-rc1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:172 [8] process_tcp_streams(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool) @ Distributed ~/.julia/juliaup/julia-1.9.0-rc1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:133 [9] (::Distributed.var"#103#104"{Sockets.TCPSocket, Sockets.TCPSocket, Bool})() @ Distributed ./task.jl:514 ```

christiangnrd commented 1 year ago

New errors. I removed the expected mapreduce threads error and that leaves us with the unmanaged threads issue in gpuarrays/linalg/mul!/vector-matrix this time, and another one in gpuarrays/statistics that might not be related.

I don't know if this relevant but it happened when I started the tests in two different terminal windows; one 1.9.0 and another 1.8.5.

Output

``` (@v1.9) pkg> test Metal Testing Metal Status `/private/var/folders/4g/lnkpkf3s4rxd_wbl8vwnqs4r0000gn/T/jl_C3LUGn/Project.toml` [79e6a3ab] Adapt v3.6.1 [6e4b80f9] BenchmarkTools v1.3.2 [0c68f7d7] GPUArrays v8.6.4 [dde4c033] Metal v0.2.0 `~/.julia/dev/Metal` [e86c9b32] ObjectiveC v0.1.0 [0418c028] Metal_LLVM_Tools_jll v0.3.0+2 [65323cdd] cmt_jll v0.2.0+0 [ade2ca70] Dates `@stdlib/Dates` [8ba89e20] Distributed `@stdlib/Distributed` [37e2e46d] LinearAlgebra `@stdlib/LinearAlgebra` [de0858da] Printf `@stdlib/Printf` [3fa0cd96] REPL `@stdlib/REPL` [9a3f8284] Random `@stdlib/Random` [10745b16] Statistics v1.9.0 `@stdlib/Statistics` [8dfed614] Test `@stdlib/Test` Status `/private/var/folders/4g/lnkpkf3s4rxd_wbl8vwnqs4r0000gn/T/jl_C3LUGn/Manifest.toml` [79e6a3ab] Adapt v3.6.1 [6e4b80f9] BenchmarkTools v1.3.2 [fa961155] CEnum v0.4.2 [e2ba6199] ExprTools v0.1.9 [0c68f7d7] GPUArrays v8.6.4 [46192b85] GPUArraysCore v0.1.4 [61eb1bfa] GPUCompiler v0.18.0 [692b3bcd] JLLWrappers v1.4.1 [682c06a0] JSON v0.21.3 [929cbde3] LLVM v4.16.0 [dde4c033] Metal v0.2.0 `~/.julia/dev/Metal` [e86c9b32] ObjectiveC v0.1.0 [69de0a69] Parsers v2.5.8 [21216c6a] Preferences v1.3.0 [189a3867] Reexport v1.2.2 [ae029012] Requires v1.3.0 [66db9d55] SnoopPrecompile v1.0.3 [a759f4b9] TimerOutputs v0.5.22 ⌅ [dad2f222] LLVMExtra_jll v0.0.16+2 [0418c028] Metal_LLVM_Tools_jll v0.3.0+2 [65323cdd] cmt_jll v0.2.0+0 [0dad84c5] ArgTools v1.1.1 `@stdlib/ArgTools` [56f22d72] Artifacts `@stdlib/Artifacts` [2a0f44e3] Base64 `@stdlib/Base64` [ade2ca70] Dates `@stdlib/Dates` [8ba89e20] Distributed `@stdlib/Distributed` [f43a241f] Downloads v1.6.0 `@stdlib/Downloads` [7b1f6079] FileWatching `@stdlib/FileWatching` [b77e0a4c] InteractiveUtils `@stdlib/InteractiveUtils` [4af54fe1] LazyArtifacts `@stdlib/LazyArtifacts` [b27032c2] LibCURL v0.6.3 `@stdlib/LibCURL` [76f85450] LibGit2 `@stdlib/LibGit2` [8f399da3] Libdl `@stdlib/Libdl` [37e2e46d] LinearAlgebra `@stdlib/LinearAlgebra` [56ddb016] Logging `@stdlib/Logging` [d6f4376e] Markdown `@stdlib/Markdown` [a63ad114] Mmap `@stdlib/Mmap` [ca575930] NetworkOptions v1.2.0 `@stdlib/NetworkOptions` [44cfe95a] Pkg v1.9.0 `@stdlib/Pkg` [de0858da] Printf `@stdlib/Printf` [9abbd945] Profile `@stdlib/Profile` [3fa0cd96] REPL `@stdlib/REPL` [9a3f8284] Random `@stdlib/Random` [ea8e919c] SHA v0.7.0 `@stdlib/SHA` [9e88b42a] Serialization `@stdlib/Serialization` [6462fe0b] Sockets `@stdlib/Sockets` [2f01184e] SparseArrays `@stdlib/SparseArrays` [10745b16] Statistics v1.9.0 `@stdlib/Statistics` [fa267f1f] TOML v1.0.3 `@stdlib/TOML` [a4e569a6] Tar v1.10.0 `@stdlib/Tar` [8dfed614] Test `@stdlib/Test` [cf7118a7] UUIDs `@stdlib/UUIDs` [4ec0a83e] Unicode `@stdlib/Unicode` [e66e0078] CompilerSupportLibraries_jll v1.0.2+0 `@stdlib/CompilerSupportLibraries_jll` [deac9b47] LibCURL_jll v7.84.0+0 `@stdlib/LibCURL_jll` [29816b5a] LibSSH2_jll v1.10.2+0 `@stdlib/LibSSH2_jll` [c8ffd9c3] MbedTLS_jll v2.28.2+0 `@stdlib/MbedTLS_jll` [14a3606d] MozillaCACerts_jll v2022.10.11 `@stdlib/MozillaCACerts_jll` [4536629a] OpenBLAS_jll v0.3.21+4 `@stdlib/OpenBLAS_jll` [bea87d4a] SuiteSparse_jll v5.10.1+6 `@stdlib/SuiteSparse_jll` [83775a58] Zlib_jll v1.2.13+0 `@stdlib/Zlib_jll` [8e850b90] libblastrampoline_jll v5.4.0+0 `@stdlib/libblastrampoline_jll` [8e850ede] nghttp2_jll v1.48.0+0 `@stdlib/nghttp2_jll` [3f19e933] p7zip_jll v17.4.0+0 `@stdlib/p7zip_jll` Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. Testing Running tests... ┌ Info: System information: │ macOS 13.2.1, Darwin 21.4.0 │ │ Toolchain: │ - Julia: 1.9.0-rc1 │ - LLVM: 14.0.6 │ │ 1 device: └ - Apple M2 Max (64.000 KiB allocated) ┌ Info: Using Metal LLVM back-end from /Users/christian/.julia/artifacts/3c74b0072cc694992a9d90b5778fb28f7ec53251/bin: │ LLVM (http://llvm.org/): │ LLVM version 14.0.0 │ Optimized build. │ Default target: aarch64-apple-darwin22.3.0 └ Host CPU: cyclone [ Info: Running 8 tests in parallel. If this is too many, specify the `--jobs` argument to the tests, or set the JULIA_CPU_THREADS environment variable. | | ---------------- CPU ---------------- | Test (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB) | From worker 2: ┌ Warning: Metal does not support Float64 values, try using Float32 instead From worker 2: └ @ Metal ~/.julia/dev/Metal/src/array.jl:38 metal (5) | 3.12 | 0.05 | 1.5 | 227.58 | 459.56 | mps (6) | 3.50 | 0.06 | 1.8 | 439.31 | 479.56 | From worker 10: 2023-03-17 14:49:41.479 julia[13374:31199] Metal GPU Frame Capture Enabled execution (4) | 15.91 | 0.41 | 2.6 | 1272.86 | 594.77 | From worker 10: [ Info: GPU frame capture saved to /private/var/folders/4g/lnkpkf3s4rxd_wbl8vwnqs4r0000gn/T/jl_vyiEBE/test.gputrace/julia_capture_1.gputrace/ profiling (10) | 11.51 | 0.45 | 3.9 | 970.22 | 556.52 | gpuarrays/indexing scalar (9) | 24.70 | 0.74 | 3.0 | 1880.63 | 590.33 | array (2) | 24.95 | 0.85 | 3.4 | 2775.94 | 746.81 | device/intrinsics (8) | 43.14 | 0.97 | 2.2 | 3231.27 | 662.48 | gpuarrays/interface (8) | 2.46 | 0.15 | 6.0 | 356.17 | 668.72 | gpuarrays/math/power (4) | 30.14 | 1.86 | 6.2 | 3803.08 | 845.55 | From worker 9: ERROR: Exception handler triggered on unmanaged thread. From worker 9: From worker 9: [13252] signal (10.1): Bus error: 10 From worker 9: in expression starting at none:1 From worker 9: unknown function (ip: 0x11fba0208) From worker 9: MTLDispatchListApply at /System/Library/Frameworks/Metal.framework/Versions/A/Metal (unknown line) From worker 9: Allocations: 72819545 (Pool: 72768195; Big: 51350); GC: 108 From worker 9: ERROR: Exception handler triggered on unmanaged thread. gpuarrays/linalg/mul!/vector-matrix (9) | failed at 2023-03-17T14:50:19.249 Worker 9 terminated. Unhandled Task ERROR: EOFError: read end of file Stacktrace: [1] (::Base.var"#wait_locked#715")(s::Sockets.TCPSocket, buf::IOBuffer, nb::Int64) @ Base ./stream.jl:947 [2] unsafe_read(s::Sockets.TCPSocket, p::Ptr{UInt8}, nb::UInt64) @ Base ./stream.jl:955 [3] unsafe_read @ ./io.jl:761 [inlined] [4] unsafe_read(s::Sockets.TCPSocket, p::Base.RefValue{NTuple{4, Int64}}, n::Int64) @ Base ./io.jl:760 [5] read! @ ./io.jl:762 [inlined] [6] deserialize_hdr_raw @ ~/.julia/juliaup/julia-1.9.0-rc1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.9/Distributed/src/messages.jl:167 [inlined] [7] message_handler_loop(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool) @ Distributed ~/.julia/juliaup/julia-1.9.0-rc1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:172 [8] process_tcp_streams(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool) @ Distributed ~/.julia/juliaup/julia-1.9.0-rc1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:133 [9] (::Distributed.var"#103#104"{Sockets.TCPSocket, Sockets.TCPSocket, Bool})() @ Distributed ./task.jl:514 gpuarrays/indexing find (2) | 30.62 | 1.82 | 6.0 | 4071.76 | 929.36 | gpuarrays/indexing multidimensional (11) | 27.37 | 0.97 | 3.6 | 3394.91 | 707.89 | examples (3) | 61.95 | 0.00 | 0.0 | 11.04 | 417.75 | gpuarrays/reductions/any all count (8) | 20.18 | 0.90 | 4.5 | 2528.35 | 784.92 | gpuarrays/math/intrinsics (3) | 17.14 | 0.50 | 2.9 | 1333.99 | 578.55 | gpuarrays/uniformscaling (12) | 20.08 | 0.64 | 3.2 | 1628.07 | 598.84 | gpuarrays/reductions/reducedim! (5) | 93.43 | 3.70 | 4.0 | 13272.69 | 1099.48 | gpuarrays/reductions/mapreducedim!_large (2) | 48.36 | 1.39 | 2.9 | 8455.50 | 1748.30 | gpuarrays/constructors (5) | 19.06 | 0.53 | 2.8 | 2033.26 | 1225.55 | gpuarrays/linalg (6) | 114.92 | 4.44 | 3.9 | 13406.79 | 1260.92 | gpuarrays/linalg/norm (8) | 54.85 | 2.67 | 4.9 | 7285.72 | 1076.25 | gpuarrays/random (2) | 16.96 | 0.51 | 3.0 | 1568.72 | 1761.31 | gpuarrays/statistics (3) | failed at 2023-03-17T14:51:40.318 gpuarrays/base (5) | 28.42 | 1.83 | 6.5 | 3858.13 | 1387.22 | gpuarrays/linalg/mul!/matrix-matrix (11) | 104.12 | 2.54 | 2.4 | 8821.33 | 1025.28 | gpuarrays/reductions/minimum maximum extrema (4) | 191.25 | 8.11 | 4.2 | 25941.07 | 1805.69 | gpuarrays/reductions/mapreduce (12) | 164.99 | 6.94 | 4.2 | 31142.18 | 1443.05 | gpuarrays/reductions/reduce (13) | 112.08 | 6.10 | 5.4 | 21430.78 | 1413.36 | gpuarrays/reductions/== isequal (6) | failed at 2023-03-17T14:53:48.159 gpuarrays/reductions/mapreducedim! (2) | 147.41 | 3.73 | 2.5 | 16261.74 | 2361.16 | gpuarrays/broadcasting (8) | 174.38 | 7.06 | 4.0 | 20910.35 | 1782.48 | gpuarrays/reductions/sum prod (5) | 155.83 | 4.38 | 2.8 | 17690.05 | 2005.61 | Testing finished in 5 minutes, 3 seconds, 534 milliseconds gpuarrays/linalg/mul!/vector-matrix: Error During Test at none:1 Got exception outside of a @test ProcessExitedException(9) Worker 3 failed running test gpuarrays/statistics: Some tests did not pass: 51 passed, 1 failed, 0 errored, 0 broken. gpuarrays/statistics: Test Failed at /Users/christian/.julia/packages/GPUArrays/7TiO1/test/testsuite/statistics.jl:55 Expression: compare((A->begin cor(A; dims = 2) end), AT, rand(ET, s, 2), nans = true) Stacktrace: [1] backtrace() @ Base ./error.jl:114 [2] record(ts::Test.DefaultTestSet, t::Union{Test.Error, Test.Fail}) @ Test ~/.julia/juliaup/julia-1.9.0-rc1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.9/Test/src/Test.jl:1041 [3] top-level scope @ ~/.julia/dev/Metal/test/runtests.jl:363 [4] include(fname::String) @ Base.MainInclude ./client.jl:478 [5] top-level scope @ none:6 [6] eval @ ./boot.jl:370 [inlined] [7] exec_options(opts::Base.JLOptions) @ Base ./client.jl:280 [8] _start() @ Base ./client.jl:522 ### "== isequal" errors removed Test Summary: | Pass Fail Error Total Time Overall | 5683 1 7 5691 metal | 127 127 mps | 5 5 execution | 17 17 profiling | 22 22 gpuarrays/indexing scalar | 398 398 array | 190 190 device/intrinsics | 25 25 gpuarrays/interface | 7 7 gpuarrays/math/power | 60 60 gpuarrays/linalg/mul!/vector-matrix | 1 1 gpuarrays/indexing find | 45 45 gpuarrays/indexing multidimensional | 42 42 examples | 3 3 gpuarrays/reductions/any all count | 101 101 gpuarrays/math/intrinsics | 10 10 gpuarrays/uniformscaling | 56 56 gpuarrays/reductions/reducedim! | 160 160 gpuarrays/reductions/mapreducedim!_large | 40 40 gpuarrays/constructors | 770 770 gpuarrays/linalg | 233 233 gpuarrays/linalg/norm | 264 264 gpuarrays/random | 50 50 gpuarrays/statistics | 51 1 52 gpuarrays/base | 73 73 gpuarrays/linalg/mul!/matrix-matrix | 360 360 gpuarrays/reductions/minimum maximum extrema | 555 555 gpuarrays/reductions/mapreduce | 330 330 gpuarrays/reductions/reduce | 220 220 gpuarrays/reductions/== isequal | 242 6 248 gpuarrays/reductions/mapreducedim! | 260 260 gpuarrays/broadcasting | 331 331 gpuarrays/reductions/sum prod | 636 636 FAILURE Error in testset gpuarrays/linalg/mul!/vector-matrix: Error During Test at none:1 Got exception outside of a @test ProcessExitedException(9) Error in testset gpuarrays/statistics: Test Failed at /Users/christian/.julia/packages/GPUArrays/7TiO1/test/testsuite/statistics.jl:55 Expression: compare((A->begin cor(A; dims = 2) end), AT, rand(ET, s, 2), nans = true) ```

christiangnrd commented 1 year ago
Another one, this time with gpuarrays/linalg/mul!/matrix-matrix

``` (@v1.9) pkg> test Metal Testing Metal Status `/private/var/folders/4g/lnkpkf3s4rxd_wbl8vwnqs4r0000gn/T/jl_l6QiuX/Project.toml` [79e6a3ab] Adapt v3.6.1 [6e4b80f9] BenchmarkTools v1.3.2 [0c68f7d7] GPUArrays v8.6.4 [dde4c033] Metal v0.2.0 `~/.julia/dev/Metal` [e86c9b32] ObjectiveC v0.1.0 [0418c028] Metal_LLVM_Tools_jll v0.3.0+2 [65323cdd] cmt_jll v0.2.0+0 [ade2ca70] Dates `@stdlib/Dates` [8ba89e20] Distributed `@stdlib/Distributed` [37e2e46d] LinearAlgebra `@stdlib/LinearAlgebra` [de0858da] Printf `@stdlib/Printf` [3fa0cd96] REPL `@stdlib/REPL` [9a3f8284] Random `@stdlib/Random` [10745b16] Statistics v1.9.0 `@stdlib/Statistics` [8dfed614] Test `@stdlib/Test` Status `/private/var/folders/4g/lnkpkf3s4rxd_wbl8vwnqs4r0000gn/T/jl_l6QiuX/Manifest.toml` [79e6a3ab] Adapt v3.6.1 [6e4b80f9] BenchmarkTools v1.3.2 [fa961155] CEnum v0.4.2 [e2ba6199] ExprTools v0.1.9 [0c68f7d7] GPUArrays v8.6.4 [46192b85] GPUArraysCore v0.1.4 [61eb1bfa] GPUCompiler v0.18.0 [692b3bcd] JLLWrappers v1.4.1 [682c06a0] JSON v0.21.3 [929cbde3] LLVM v4.16.0 [dde4c033] Metal v0.2.0 `~/.julia/dev/Metal` [e86c9b32] ObjectiveC v0.1.0 [69de0a69] Parsers v2.5.8 [21216c6a] Preferences v1.3.0 [189a3867] Reexport v1.2.2 [ae029012] Requires v1.3.0 [66db9d55] SnoopPrecompile v1.0.3 [a759f4b9] TimerOutputs v0.5.22 ⌅ [dad2f222] LLVMExtra_jll v0.0.16+2 [0418c028] Metal_LLVM_Tools_jll v0.3.0+2 [65323cdd] cmt_jll v0.2.0+0 [0dad84c5] ArgTools v1.1.1 `@stdlib/ArgTools` [56f22d72] Artifacts `@stdlib/Artifacts` [2a0f44e3] Base64 `@stdlib/Base64` [ade2ca70] Dates `@stdlib/Dates` [8ba89e20] Distributed `@stdlib/Distributed` [f43a241f] Downloads v1.6.0 `@stdlib/Downloads` [7b1f6079] FileWatching `@stdlib/FileWatching` [b77e0a4c] InteractiveUtils `@stdlib/InteractiveUtils` [4af54fe1] LazyArtifacts `@stdlib/LazyArtifacts` [b27032c2] LibCURL v0.6.3 `@stdlib/LibCURL` [76f85450] LibGit2 `@stdlib/LibGit2` [8f399da3] Libdl `@stdlib/Libdl` [37e2e46d] LinearAlgebra `@stdlib/LinearAlgebra` [56ddb016] Logging `@stdlib/Logging` [d6f4376e] Markdown `@stdlib/Markdown` [a63ad114] Mmap `@stdlib/Mmap` [ca575930] NetworkOptions v1.2.0 `@stdlib/NetworkOptions` [44cfe95a] Pkg v1.9.0 `@stdlib/Pkg` [de0858da] Printf `@stdlib/Printf` [9abbd945] Profile `@stdlib/Profile` [3fa0cd96] REPL `@stdlib/REPL` [9a3f8284] Random `@stdlib/Random` [ea8e919c] SHA v0.7.0 `@stdlib/SHA` [9e88b42a] Serialization `@stdlib/Serialization` [6462fe0b] Sockets `@stdlib/Sockets` [2f01184e] SparseArrays `@stdlib/SparseArrays` [10745b16] Statistics v1.9.0 `@stdlib/Statistics` [fa267f1f] TOML v1.0.3 `@stdlib/TOML` [a4e569a6] Tar v1.10.0 `@stdlib/Tar` [8dfed614] Test `@stdlib/Test` [cf7118a7] UUIDs `@stdlib/UUIDs` [4ec0a83e] Unicode `@stdlib/Unicode` [e66e0078] CompilerSupportLibraries_jll v1.0.2+0 `@stdlib/CompilerSupportLibraries_jll` [deac9b47] LibCURL_jll v7.84.0+0 `@stdlib/LibCURL_jll` [29816b5a] LibSSH2_jll v1.10.2+0 `@stdlib/LibSSH2_jll` [c8ffd9c3] MbedTLS_jll v2.28.2+0 `@stdlib/MbedTLS_jll` [14a3606d] MozillaCACerts_jll v2022.10.11 `@stdlib/MozillaCACerts_jll` [4536629a] OpenBLAS_jll v0.3.21+4 `@stdlib/OpenBLAS_jll` [bea87d4a] SuiteSparse_jll v5.10.1+6 `@stdlib/SuiteSparse_jll` [83775a58] Zlib_jll v1.2.13+0 `@stdlib/Zlib_jll` [8e850b90] libblastrampoline_jll v5.4.0+0 `@stdlib/libblastrampoline_jll` [8e850ede] nghttp2_jll v1.48.0+0 `@stdlib/nghttp2_jll` [3f19e933] p7zip_jll v17.4.0+0 `@stdlib/p7zip_jll` Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. Testing Running tests... ┌ Info: System information: │ macOS 13.2.1, Darwin 21.4.0 │ │ Toolchain: │ - Julia: 1.9.0-rc1 │ - LLVM: 14.0.6 │ │ 1 device: └ - Apple M2 Max (64.000 KiB allocated) ┌ Info: Using Metal LLVM back-end from /Users/christian/.julia/artifacts/3c74b0072cc694992a9d90b5778fb28f7ec53251/bin: │ LLVM (http://llvm.org/): │ LLVM version 14.0.0 │ Optimized build. │ Default target: aarch64-apple-darwin22.3.0 └ Host CPU: cyclone [ Info: Running 8 tests in parallel. If this is too many, specify the `--jobs` argument to the tests, or set the JULIA_CPU_THREADS environment variable. | | ---------------- CPU ---------------- | Test (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB) | From worker 2: ┌ Warning: Metal does not support Float64 values, try using Float32 instead From worker 2: └ @ Metal ~/.julia/dev/Metal/src/array.jl:38 metal (5) | 2.46 | 0.04 | 1.5 | 227.58 | 482.12 | mps (6) | 3.01 | 0.06 | 2.0 | 439.31 | 481.08 | From worker 10: 2023-03-17 15:08:51.452 julia[21294:56351] Metal GPU Frame Capture Enabled execution (4) | 14.01 | 0.36 | 2.6 | 1272.86 | 606.38 | gpuarrays/indexing scalar (9) | 16.13 | 0.41 | 2.5 | 1880.42 | 627.48 | From worker 10: [ Info: GPU frame capture saved to /private/var/folders/4g/lnkpkf3s4rxd_wbl8vwnqs4r0000gn/T/jl_7iclNx/test.gputrace/julia_capture_1.gputrace/ profiling (10) | 12.61 | 0.43 | 3.4 | 970.22 | 563.61 | array (2) | 22.14 | 0.68 | 3.1 | 2775.94 | 760.55 | device/intrinsics (8) | 36.88 | 0.76 | 2.1 | 3231.27 | 695.14 | gpuarrays/interface (8) | 2.33 | 0.13 | 5.8 | 356.10 | 702.08 | gpuarrays/math/power (4) | 29.13 | 1.81 | 6.2 | 3803.07 | 841.95 | gpuarrays/indexing find (2) | 28.67 | 1.88 | 6.6 | 4071.55 | 943.05 | gpuarrays/uniformscaling (2) | 8.15 | 0.14 | 1.8 | 643.77 | 969.77 | examples (3) | 61.19 | 0.00 | 0.0 | 11.03 | 420.02 | gpuarrays/reductions/any all count (8) | 21.25 | 0.92 | 4.3 | 2528.35 | 758.12 | gpuarrays/math/intrinsics (8) | 3.09 | 0.08 | 2.5 | 239.84 | 769.11 | gpuarrays/indexing multidimensional (11) | 34.00 | 1.17 | 3.5 | 3394.29 | 697.84 | gpuarrays/linalg/mul!/vector-matrix (9) | 61.78 | 1.66 | 2.7 | 5702.98 | 863.64 | gpuarrays/reductions/reducedim! (5) | 87.55 | 3.70 | 4.2 | 13272.68 | 1107.28 | gpuarrays/reductions/mapreducedim!_large (2) | 49.24 | 1.39 | 2.8 | 8354.85 | 1823.91 | gpuarrays/constructors (5) | 18.90 | 0.54 | 2.9 | 2033.32 | 1241.78 | gpuarrays/linalg/norm (8) | 53.63 | 2.60 | 4.8 | 7275.33 | 1115.86 | gpuarrays/linalg (6) | 117.12 | 4.50 | 3.8 | 13406.72 | 1264.52 | gpuarrays/random (2) | 16.49 | 0.53 | 3.2 | 1566.54 | 1853.92 | gpuarrays/statistics (11) | 62.19 | 3.45 | 5.5 | 7306.38 | 1033.64 | gpuarrays/base (5) | 33.01 | 2.12 | 6.4 | 3858.07 | 1474.92 | gpuarrays/reductions/== isequal (8) | failed at 2023-03-17T15:11:45.633 From worker 3: ERROR: Exception handler triggered on unmanaged thread. From worker 3: From worker 3: [21278] signal (10.1): Bus error: 10 From worker 3: in expression starting at none:0 From worker 3: unknown function (ip: 0x12f624208) From worker 3: MTLDispatchListApply at /System/Library/Frameworks/Metal.framework/Versions/A/Metal (unknown line) From worker 3: Allocations: 161484420 (Pool: 161334323; Big: 150097); GC: 231 From worker 3: ERROR: Exception handler triggered on unmanaged thread. gpuarrays/linalg/mul!/matrix-matrix (3) | failed at 2023-03-17T15:12:09.672 Worker 3 terminated. Unhandled Task ERROR: EOFError: read end of file Stacktrace: [1] (::Base.var"#wait_locked#715")(s::Sockets.TCPSocket, buf::IOBuffer, nb::Int64) @ Base ./stream.jl:947 [2] unsafe_read(s::Sockets.TCPSocket, p::Ptr{UInt8}, nb::UInt64) @ Base ./stream.jl:955 [3] unsafe_read @ ./io.jl:761 [inlined] [4] unsafe_read(s::Sockets.TCPSocket, p::Base.RefValue{NTuple{4, Int64}}, n::Int64) @ Base ./io.jl:760 [5] read! @ ./io.jl:762 [inlined] [6] deserialize_hdr_raw @ ~/.julia/juliaup/julia-1.9.0-rc1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.9/Distributed/src/messages.jl:167 [inlined] [7] message_handler_loop(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool) @ Distributed ~/.julia/juliaup/julia-1.9.0-rc1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:172 [8] process_tcp_streams(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool) @ Distributed ~/.julia/juliaup/julia-1.9.0-rc1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:133 [9] (::Distributed.var"#103#104"{Sockets.TCPSocket, Sockets.TCPSocket, Bool})() @ Distributed ./task.jl:514 gpuarrays/reductions/mapreducedim! (2) | 84.72 | 3.62 | 4.3 | 16261.15 | 2210.14 | gpuarrays/reductions/reduce (11) | 103.73 | 5.28 | 5.1 | 19516.37 | 1533.66 | gpuarrays/reductions/sum prod (5) | 95.40 | 4.60 | 4.8 | 17690.11 | 1986.80 | gpuarrays/reductions/mapreduce (9) | 166.44 | 7.97 | 4.8 | 30266.48 | 1678.20 | gpuarrays/reductions/minimum maximum extrema (4) | 202.07 | 8.46 | 4.2 | 25941.04 | 1745.56 | gpuarrays/broadcasting (6) | 157.75 | 6.15 | 3.9 | 20279.36 | 1814.19 | Testing finished in 4 minutes, 40 seconds, 653 milliseconds Worker 8 failed running test gpuarrays/reductions/== isequal: Some tests did not pass: 242 passed, 0 failed, 6 errored, 0 broken. ### isequal stacktraces removed gpuarrays/linalg/mul!/matrix-matrix: Error During Test at none:1 Got exception outside of a @test ProcessExitedException(3) Test Summary: | Pass Error Total Time Overall | 5464 7 5471 metal | 127 127 mps | 5 5 execution | 17 17 gpuarrays/indexing scalar | 398 398 profiling | 22 22 array | 190 190 device/intrinsics | 25 25 gpuarrays/interface | 7 7 gpuarrays/math/power | 60 60 gpuarrays/indexing find | 45 45 gpuarrays/uniformscaling | 56 56 examples | 3 3 gpuarrays/reductions/any all count | 101 101 gpuarrays/math/intrinsics | 10 10 gpuarrays/indexing multidimensional | 42 42 gpuarrays/linalg/mul!/vector-matrix | 140 140 gpuarrays/reductions/reducedim! | 160 160 gpuarrays/reductions/mapreducedim!_large | 40 40 gpuarrays/constructors | 770 770 gpuarrays/linalg/norm | 264 264 gpuarrays/linalg | 233 233 gpuarrays/random | 50 50 gpuarrays/statistics | 52 52 gpuarrays/base | 73 73 gpuarrays/reductions/== isequal | 242 6 248 gpuarrays/linalg/mul!/matrix-matrix | 1 1 gpuarrays/reductions/mapreducedim! | 260 260 gpuarrays/reductions/reduce | 220 220 gpuarrays/reductions/sum prod | 636 636 gpuarrays/reductions/mapreduce | 330 330 gpuarrays/reductions/minimum maximum extrema | 555 555 gpuarrays/broadcasting | 331 331 FAILURE ### "== isequal" stacktrace removed Error in testset gpuarrays/linalg/mul!/matrix-matrix: Error During Test at none:1 Got exception outside of a @test ProcessExitedException(3) ERROR: LoadError: Test run finished with errors in expression starting at /Users/christian/.julia/dev/Metal/test/runtests.jl:394 ERROR: Package Metal errored during testing ```

maleadt commented 1 year ago

From worker 3: ERROR: Exception handler triggered on unmanaged thread. From worker 3:
From worker 3: [21278] signal (10.1): Bus error: 10 From worker 3: in expression starting at none:0 From worker 3: unknown function (ip: 0x12f624208) From worker 3: MTLDispatchListApply at /System/Library/Frameworks/Metal.framework/Versions/A/Metal (unknown line) From worker 3: Allocations: 161484420 (Pool: 161334323; Big: 150097); GC: 231 From worker 3: ERROR: Exception handler triggered on unmanaged thread.

Did you get this on the latest master, including https://github.com/JuliaGPU/Metal.jl/pull/140?

christiangnrd commented 1 year ago

Yes.

I was just about to post one where the failure happened twice.

Here it is:

``` (@v1.9) pkg> test Metal Testing Metal Status `/private/var/folders/4g/lnkpkf3s4rxd_wbl8vwnqs4r0000gn/T/jl_GGekTX/Project.toml` [79e6a3ab] Adapt v3.6.1 [6e4b80f9] BenchmarkTools v1.3.2 [0c68f7d7] GPUArrays v8.6.4 [dde4c033] Metal v0.2.0 `~/.julia/dev/Metal` [e86c9b32] ObjectiveC v0.1.0 [0418c028] Metal_LLVM_Tools_jll v0.3.0+2 [65323cdd] cmt_jll v0.2.0+0 [ade2ca70] Dates `@stdlib/Dates` [8ba89e20] Distributed `@stdlib/Distributed` [37e2e46d] LinearAlgebra `@stdlib/LinearAlgebra` [de0858da] Printf `@stdlib/Printf` [3fa0cd96] REPL `@stdlib/REPL` [9a3f8284] Random `@stdlib/Random` [10745b16] Statistics v1.9.0 `@stdlib/Statistics` [8dfed614] Test `@stdlib/Test` Status `/private/var/folders/4g/lnkpkf3s4rxd_wbl8vwnqs4r0000gn/T/jl_GGekTX/Manifest.toml` [79e6a3ab] Adapt v3.6.1 [6e4b80f9] BenchmarkTools v1.3.2 [fa961155] CEnum v0.4.2 [e2ba6199] ExprTools v0.1.9 [0c68f7d7] GPUArrays v8.6.4 [46192b85] GPUArraysCore v0.1.4 [61eb1bfa] GPUCompiler v0.18.0 [692b3bcd] JLLWrappers v1.4.1 [682c06a0] JSON v0.21.3 [929cbde3] LLVM v4.16.0 [dde4c033] Metal v0.2.0 `~/.julia/dev/Metal` [e86c9b32] ObjectiveC v0.1.0 [69de0a69] Parsers v2.5.8 [21216c6a] Preferences v1.3.0 [189a3867] Reexport v1.2.2 [ae029012] Requires v1.3.0 [66db9d55] SnoopPrecompile v1.0.3 [a759f4b9] TimerOutputs v0.5.22 ⌅ [dad2f222] LLVMExtra_jll v0.0.16+2 [0418c028] Metal_LLVM_Tools_jll v0.3.0+2 [65323cdd] cmt_jll v0.2.0+0 [0dad84c5] ArgTools v1.1.1 `@stdlib/ArgTools` [56f22d72] Artifacts `@stdlib/Artifacts` [2a0f44e3] Base64 `@stdlib/Base64` [ade2ca70] Dates `@stdlib/Dates` [8ba89e20] Distributed `@stdlib/Distributed` [f43a241f] Downloads v1.6.0 `@stdlib/Downloads` [7b1f6079] FileWatching `@stdlib/FileWatching` [b77e0a4c] InteractiveUtils `@stdlib/InteractiveUtils` [4af54fe1] LazyArtifacts `@stdlib/LazyArtifacts` [b27032c2] LibCURL v0.6.3 `@stdlib/LibCURL` [76f85450] LibGit2 `@stdlib/LibGit2` [8f399da3] Libdl `@stdlib/Libdl` [37e2e46d] LinearAlgebra `@stdlib/LinearAlgebra` [56ddb016] Logging `@stdlib/Logging` [d6f4376e] Markdown `@stdlib/Markdown` [a63ad114] Mmap `@stdlib/Mmap` [ca575930] NetworkOptions v1.2.0 `@stdlib/NetworkOptions` [44cfe95a] Pkg v1.9.0 `@stdlib/Pkg` [de0858da] Printf `@stdlib/Printf` [9abbd945] Profile `@stdlib/Profile` [3fa0cd96] REPL `@stdlib/REPL` [9a3f8284] Random `@stdlib/Random` [ea8e919c] SHA v0.7.0 `@stdlib/SHA` [9e88b42a] Serialization `@stdlib/Serialization` [6462fe0b] Sockets `@stdlib/Sockets` [2f01184e] SparseArrays `@stdlib/SparseArrays` [10745b16] Statistics v1.9.0 `@stdlib/Statistics` [fa267f1f] TOML v1.0.3 `@stdlib/TOML` [a4e569a6] Tar v1.10.0 `@stdlib/Tar` [8dfed614] Test `@stdlib/Test` [cf7118a7] UUIDs `@stdlib/UUIDs` [4ec0a83e] Unicode `@stdlib/Unicode` [e66e0078] CompilerSupportLibraries_jll v1.0.2+0 `@stdlib/CompilerSupportLibraries_jll` [deac9b47] LibCURL_jll v7.84.0+0 `@stdlib/LibCURL_jll` [29816b5a] LibSSH2_jll v1.10.2+0 `@stdlib/LibSSH2_jll` [c8ffd9c3] MbedTLS_jll v2.28.2+0 `@stdlib/MbedTLS_jll` [14a3606d] MozillaCACerts_jll v2022.10.11 `@stdlib/MozillaCACerts_jll` [4536629a] OpenBLAS_jll v0.3.21+4 `@stdlib/OpenBLAS_jll` [bea87d4a] SuiteSparse_jll v5.10.1+6 `@stdlib/SuiteSparse_jll` [83775a58] Zlib_jll v1.2.13+0 `@stdlib/Zlib_jll` [8e850b90] libblastrampoline_jll v5.4.0+0 `@stdlib/libblastrampoline_jll` [8e850ede] nghttp2_jll v1.48.0+0 `@stdlib/nghttp2_jll` [3f19e933] p7zip_jll v17.4.0+0 `@stdlib/p7zip_jll` Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. Testing Running tests... ┌ Info: System information: │ macOS 13.2.1, Darwin 21.4.0 │ │ Toolchain: │ - Julia: 1.9.0-rc1 │ - LLVM: 14.0.6 │ │ 1 device: └ - Apple M2 Max (64.000 KiB allocated) ┌ Info: Using Metal LLVM back-end from /Users/christian/.julia/artifacts/3c74b0072cc694992a9d90b5778fb28f7ec53251/bin: │ LLVM (http://llvm.org/): │ LLVM version 14.0.0 │ Optimized build. │ Default target: aarch64-apple-darwin22.3.0 └ Host CPU: cyclone [ Info: Running 8 tests in parallel. If this is too many, specify the `--jobs` argument to the tests, or set the JULIA_CPU_THREADS environment variable. | | ---------------- CPU ---------------- | Test (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB) | From worker 2: ┌ Warning: Metal does not support Float64 values, try using Float32 instead From worker 2: └ @ Metal ~/.julia/dev/Metal/src/array.jl:38 metal (5) | 3.10 | 0.04 | 1.3 | 227.58 | 481.00 | mps (6) | 3.41 | 0.07 | 1.9 | 439.31 | 489.80 | From worker 10: 2023-03-17 15:40:44.404 julia[40986:100357] Metal GPU Frame Capture Enabled execution (4) | 13.27 | 0.36 | 2.7 | 1272.86 | 601.64 | gpuarrays/indexing scalar (9) | 18.37 | 0.57 | 3.1 | 1880.43 | 601.45 | From worker 10: [ Info: GPU frame capture saved to /private/var/folders/4g/lnkpkf3s4rxd_wbl8vwnqs4r0000gn/T/jl_btXhvU/test.gputrace/julia_capture_1.gputrace/ profiling (10) | 11.16 | 0.37 | 3.3 | 970.22 | 537.20 | array (2) | 23.94 | 0.78 | 3.3 | 2775.94 | 748.16 | device/intrinsics (8) | 34.83 | 0.84 | 2.4 | 3231.27 | 683.53 | gpuarrays/interface (8) | 2.32 | 0.12 | 5.1 | 356.09 | 691.77 | gpuarrays/math/power (4) | 31.17 | 1.77 | 5.7 | 3803.28 | 860.91 | gpuarrays/indexing find (2) | 31.97 | 1.98 | 6.2 | 4071.75 | 938.06 | gpuarrays/reductions/any all count (8) | 20.71 | 0.98 | 4.7 | 2528.35 | 756.08 | examples (3) | 61.53 | 0.00 | 0.0 | 11.03 | 420.02 | gpuarrays/uniformscaling (2) | 6.78 | 0.14 | 2.0 | 643.77 | 966.45 | gpuarrays/math/intrinsics (2) | 2.21 | 0.05 | 2.3 | 297.02 | 978.30 | gpuarrays/linalg/mul!/vector-matrix (9) | 48.51 | 1.32 | 2.7 | 5703.02 | 853.27 | gpuarrays/indexing multidimensional (11) | 38.40 | 1.47 | 3.8 | 3394.09 | 644.81 | gpuarrays/statistics (9) | 34.56 | 2.09 | 6.1 | 7192.49 | 1136.78 | gpuarrays/linalg/norm (2) | 42.01 | 2.17 | 5.2 | 7277.53 | 1282.61 | gpuarrays/constructors (9) | 11.78 | 0.39 | 3.3 | 1806.74 | 1222.06 | gpuarrays/random (2) | 9.02 | 0.34 | 3.8 | 1666.19 | 1302.97 | gpuarrays/linalg/mul!/matrix-matrix (3) | 67.41 | 2.14 | 3.2 | 10038.10 | 1014.06 | gpuarrays/base (9) | 15.45 | 0.84 | 5.5 | 3628.85 | 1493.89 | gpuarrays/reductions/== isequal (2) | failed at 2023-03-17T15:43:44.218 From worker 8: ERROR: Exception handler triggered on unmanaged thread. From worker 8: From worker 8: [40975] signal (10.1): Bus error: 10 From worker 8: in expression starting at none:1 From worker 8: unknown function (ip: 0x13740c218) From worker 8: MTLDispatchListApply at /System/Library/Frameworks/Metal.framework/Versions/A/Metal (unknown line) From worker 8: Allocations: 162467620 (Pool: 162357150; Big: 110470); GC: 206 From worker 8: ERROR: Exception handler triggered on unmanaged thread. gpuarrays/reductions/mapreducedim!_large (8) | failed at 2023-03-17T15:43:44.495 Worker 8 terminated. Unhandled Task ERROR: EOFError: read end of file Stacktrace: [1] (::Base.var"#wait_locked#715")(s::Sockets.TCPSocket, buf::IOBuffer, nb::Int64) @ Base ./stream.jl:947 [2] unsafe_read(s::Sockets.TCPSocket, p::Ptr{UInt8}, nb::UInt64) @ Base ./stream.jl:955 [3] unsafe_read @ ./io.jl:761 [inlined] [4] unsafe_read(s::Sockets.TCPSocket, p::Base.RefValue{NTuple{4, Int64}}, n::Int64) @ Base ./io.jl:760 [5] read! @ ./io.jl:762 [inlined] [6] deserialize_hdr_raw @ ~/.julia/juliaup/julia-1.9.0-rc1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.9/Distributed/src/messages.jl:167 [inlined] [7] message_handler_loop(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool) @ Distributed ~/.julia/juliaup/julia-1.9.0-rc1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:172 [8] process_tcp_streams(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool) @ Distributed ~/.julia/juliaup/julia-1.9.0-rc1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:133 [9] (::Distributed.var"#103#104"{Sockets.TCPSocket, Sockets.TCPSocket, Bool})() @ Distributed ./task.jl:514 gpuarrays/reductions/minimum maximum extrema (4) | 152.14 | 7.78 | 5.1 | 25941.03 | 1495.75 | gpuarrays/reductions/reducedim! (5) | 206.39 | 3.56 | 1.7 | 13272.69 | 970.84 | From worker 3: ERROR: Exception handler triggered on unmanaged thread. From worker 3: From worker 3: [40970] signal (10.1): Bus error: 10 From worker 3: in expression starting at none:1 From worker 3: unknown function (ip: 0x12144c208) From worker 3: MTLDispatchListApply at /System/Library/Frameworks/Metal.framework/Versions/A/Metal (unknown line) From worker 3: Allocations: 348978119 (Pool: 348676755; Big: 301364); GC: 503 From worker 3: ERROR: Exception handler triggered on unmanaged thread. gpuarrays/broadcasting (3) | failed at 2023-03-17T15:44:08.150 Worker 3 terminated. Unhandled Task ERROR: EOFError: read end of file Stacktrace: [1] (::Base.var"#wait_locked#715")(s::Sockets.TCPSocket, buf::IOBuffer, nb::Int64) @ Base ./stream.jl:947 [2] unsafe_read(s::Sockets.TCPSocket, p::Ptr{UInt8}, nb::UInt64) @ Base ./stream.jl:955 [3] unsafe_read @ ./io.jl:761 [inlined] [4] unsafe_read(s::Sockets.TCPSocket, p::Base.RefValue{NTuple{4, Int64}}, n::Int64) @ Base ./io.jl:760 [5] read! @ ./io.jl:762 [inlined] [6] deserialize_hdr_raw @ ~/.julia/juliaup/julia-1.9.0-rc1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.9/Distributed/src/messages.jl:167 [inlined] [7] message_handler_loop(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool) @ Distributed ~/.julia/juliaup/julia-1.9.0-rc1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:172 [8] process_tcp_streams(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool) @ Distributed ~/.julia/juliaup/julia-1.9.0-rc1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:133 [9] (::Distributed.var"#103#104"{Sockets.TCPSocket, Sockets.TCPSocket, Bool})() @ Distributed ./task.jl:514 gpuarrays/reductions/mapreducedim! (9) | 94.11 | 4.21 | 4.5 | 17254.67 | 1515.70 | gpuarrays/linalg (6) | 231.08 | 4.18 | 1.8 | 13406.81 | 1093.69 | gpuarrays/reductions/reduce (12) | 102.47 | 5.56 | 5.4 | 21430.14 | 1443.39 | gpuarrays/reductions/sum prod (13) | 103.00 | 4.61 | 4.5 | 20877.95 | 1417.56 | gpuarrays/reductions/mapreduce (11) | 252.76 | 6.22 | 2.5 | 30905.73 | 1413.80 | Testing finished in 5 minutes, 25 seconds, 781 milliseconds ## isequal stuff removed gpuarrays/reductions/mapreducedim!_large: Error During Test at none:1 Got exception outside of a @test ProcessExitedException(8) gpuarrays/broadcasting: Error During Test at none:1 Got exception outside of a @test ProcessExitedException(3) Test Summary: | Pass Error Total Time Overall | 5453 8 5461 metal | 127 127 mps | 5 5 execution | 17 17 gpuarrays/indexing scalar | 398 398 profiling | 22 22 array | 190 190 device/intrinsics | 25 25 gpuarrays/interface | 7 7 gpuarrays/math/power | 60 60 gpuarrays/indexing find | 45 45 gpuarrays/reductions/any all count | 101 101 examples | 3 3 gpuarrays/uniformscaling | 56 56 gpuarrays/math/intrinsics | 10 10 gpuarrays/linalg/mul!/vector-matrix | 140 140 gpuarrays/indexing multidimensional | 42 42 gpuarrays/statistics | 52 52 gpuarrays/linalg/norm | 264 264 gpuarrays/constructors | 770 770 gpuarrays/random | 50 50 gpuarrays/linalg/mul!/matrix-matrix | 360 360 gpuarrays/base | 73 73 gpuarrays/reductions/== isequal | 242 6 248 gpuarrays/reductions/mapreducedim!_large | 1 1 gpuarrays/reductions/minimum maximum extrema | 555 555 gpuarrays/reductions/reducedim! | 160 160 gpuarrays/broadcasting | 1 1 gpuarrays/reductions/mapreducedim! | 260 260 gpuarrays/linalg | 233 233 gpuarrays/reductions/reduce | 220 220 gpuarrays/reductions/sum prod | 636 636 gpuarrays/reductions/mapreduce | 330 330 FAILURE ## isequal stuff removed Error During Test at none:1 Got exception outside of a @test ProcessExitedException(8) Error in testset gpuarrays/broadcasting: Error During Test at none:1 Got exception outside of a @test ProcessExitedException(3) ERROR: LoadError: Test run finished with errors in expression starting at /Users/christian/.julia/dev/Metal/test/runtests.jl:394 ERROR: Package Metal errored during testing ```

Either these output dumps are displaying two different bugs, or the same one gets expressed in different ways.

maleadt commented 1 year ago

That's surprising. ERROR: Exception handler triggered on unmanaged thread. comes from Julia's runtime, but the only place where we execute Julia code on a potentially unmanaged thread is the Objective-C blocks used as command buffer calbacks, so I figured that adding a try/catch there would solve the issue.

christiangnrd commented 1 year ago

I reran it just to be sure it was on main and it happened again, but it also caught an error that didn't cause any tests to fail.

I found this about bus error 10. This answer about bus error mentions that it's a hardware error that can't be caught in the traditional sense.

Output:

``` (jl_HMeEOW) pkg> test Metal Testing Metal Status `/private/var/folders/4g/lnkpkf3s4rxd_wbl8vwnqs4r0000gn/T/jl_qEqgru/Project.toml` [79e6a3ab] Adapt v3.6.1 [6e4b80f9] BenchmarkTools v1.3.2 [0c68f7d7] GPUArrays v8.6.4 [dde4c033] Metal v0.2.0 `https://github.com/JuliaGPU/Metal.jl.git#main` [e86c9b32] ObjectiveC v0.1.0 [0418c028] Metal_LLVM_Tools_jll v0.3.0+2 [65323cdd] cmt_jll v0.2.0+0 [ade2ca70] Dates `@stdlib/Dates` [8ba89e20] Distributed `@stdlib/Distributed` [37e2e46d] LinearAlgebra `@stdlib/LinearAlgebra` [de0858da] Printf `@stdlib/Printf` [3fa0cd96] REPL `@stdlib/REPL` [9a3f8284] Random `@stdlib/Random` [10745b16] Statistics v1.9.0 `@stdlib/Statistics` [8dfed614] Test `@stdlib/Test` Status `/private/var/folders/4g/lnkpkf3s4rxd_wbl8vwnqs4r0000gn/T/jl_qEqgru/Manifest.toml` [79e6a3ab] Adapt v3.6.1 [6e4b80f9] BenchmarkTools v1.3.2 [fa961155] CEnum v0.4.2 [e2ba6199] ExprTools v0.1.9 [0c68f7d7] GPUArrays v8.6.4 [46192b85] GPUArraysCore v0.1.4 [61eb1bfa] GPUCompiler v0.18.0 [692b3bcd] JLLWrappers v1.4.1 [682c06a0] JSON v0.21.3 [929cbde3] LLVM v4.16.0 [dde4c033] Metal v0.2.0 `https://github.com/JuliaGPU/Metal.jl.git#main` [e86c9b32] ObjectiveC v0.1.0 [69de0a69] Parsers v2.5.8 [21216c6a] Preferences v1.3.0 [189a3867] Reexport v1.2.2 [ae029012] Requires v1.3.0 [66db9d55] SnoopPrecompile v1.0.3 [a759f4b9] TimerOutputs v0.5.22 ⌅ [dad2f222] LLVMExtra_jll v0.0.16+2 [0418c028] Metal_LLVM_Tools_jll v0.3.0+2 [65323cdd] cmt_jll v0.2.0+0 [0dad84c5] ArgTools v1.1.1 `@stdlib/ArgTools` [56f22d72] Artifacts `@stdlib/Artifacts` [2a0f44e3] Base64 `@stdlib/Base64` [ade2ca70] Dates `@stdlib/Dates` [8ba89e20] Distributed `@stdlib/Distributed` [f43a241f] Downloads v1.6.0 `@stdlib/Downloads` [7b1f6079] FileWatching `@stdlib/FileWatching` [b77e0a4c] InteractiveUtils `@stdlib/InteractiveUtils` [4af54fe1] LazyArtifacts `@stdlib/LazyArtifacts` [b27032c2] LibCURL v0.6.3 `@stdlib/LibCURL` [76f85450] LibGit2 `@stdlib/LibGit2` [8f399da3] Libdl `@stdlib/Libdl` [37e2e46d] LinearAlgebra `@stdlib/LinearAlgebra` [56ddb016] Logging `@stdlib/Logging` [d6f4376e] Markdown `@stdlib/Markdown` [a63ad114] Mmap `@stdlib/Mmap` [ca575930] NetworkOptions v1.2.0 `@stdlib/NetworkOptions` [44cfe95a] Pkg v1.9.0 `@stdlib/Pkg` [de0858da] Printf `@stdlib/Printf` [9abbd945] Profile `@stdlib/Profile` [3fa0cd96] REPL `@stdlib/REPL` [9a3f8284] Random `@stdlib/Random` [ea8e919c] SHA v0.7.0 `@stdlib/SHA` [9e88b42a] Serialization `@stdlib/Serialization` [6462fe0b] Sockets `@stdlib/Sockets` [2f01184e] SparseArrays `@stdlib/SparseArrays` [10745b16] Statistics v1.9.0 `@stdlib/Statistics` [fa267f1f] TOML v1.0.3 `@stdlib/TOML` [a4e569a6] Tar v1.10.0 `@stdlib/Tar` [8dfed614] Test `@stdlib/Test` [cf7118a7] UUIDs `@stdlib/UUIDs` [4ec0a83e] Unicode `@stdlib/Unicode` [e66e0078] CompilerSupportLibraries_jll v1.0.2+0 `@stdlib/CompilerSupportLibraries_jll` [deac9b47] LibCURL_jll v7.84.0+0 `@stdlib/LibCURL_jll` [29816b5a] LibSSH2_jll v1.10.2+0 `@stdlib/LibSSH2_jll` [c8ffd9c3] MbedTLS_jll v2.28.2+0 `@stdlib/MbedTLS_jll` [14a3606d] MozillaCACerts_jll v2022.10.11 `@stdlib/MozillaCACerts_jll` [4536629a] OpenBLAS_jll v0.3.21+4 `@stdlib/OpenBLAS_jll` [bea87d4a] SuiteSparse_jll v5.10.1+6 `@stdlib/SuiteSparse_jll` [83775a58] Zlib_jll v1.2.13+0 `@stdlib/Zlib_jll` [8e850b90] libblastrampoline_jll v5.4.0+0 `@stdlib/libblastrampoline_jll` [8e850ede] nghttp2_jll v1.48.0+0 `@stdlib/nghttp2_jll` [3f19e933] p7zip_jll v17.4.0+0 `@stdlib/p7zip_jll` Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. Testing Running tests... ┌ Info: System information: │ macOS 13.2.1, Darwin 21.4.0 │ │ Toolchain: │ - Julia: 1.9.0-rc1 │ - LLVM: 14.0.6 │ │ 1 device: └ - Apple M2 Max (64.000 KiB allocated) ┌ Info: Using Metal LLVM back-end from /Users/christian/.julia/artifacts/3c74b0072cc694992a9d90b5778fb28f7ec53251/bin: │ LLVM (http://llvm.org/): │ LLVM version 14.0.0 │ Optimized build. │ Default target: aarch64-apple-darwin22.3.0 └ Host CPU: cyclone [ Info: Running 8 tests in parallel. If this is too many, specify the `--jobs` argument to the tests, or set the JULIA_CPU_THREADS environment variable. | | ---------------- CPU ---------------- | Test (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB) | From worker 2: ┌ Warning: Metal does not support Float64 values, try using Float32 instead From worker 2: └ @ Metal ~/.julia/packages/Metal/umUcZ/src/array.jl:38 metal (5) | 1.69 | 0.02 | 1.3 | 218.45 | 466.16 | From worker 10: 2023-03-17 15:56:51.534 julia[49507:121859] Metal GPU Frame Capture Enabled execution (4) | 7.70 | 0.24 | 3.1 | 1265.00 | 561.81 | mps (6) | 8.35 | 0.27 | 3.2 | 1379.98 | 632.30 | gpuarrays/indexing scalar (9) | 10.96 | 0.33 | 3.0 | 1830.66 | 624.69 | From worker 10: [ Info: GPU frame capture saved to /private/var/folders/4g/lnkpkf3s4rxd_wbl8vwnqs4r0000gn/T/jl_8jx8Bv/test.gputrace/julia_capture_1.gputrace/ profiling (10) | 6.62 | 0.23 | 3.4 | 969.57 | 555.14 | array (2) | 12.00 | 0.36 | 3.0 | 2008.94 | 642.00 | device/intrinsics (8) | 17.89 | 0.59 | 3.3 | 3250.17 | 750.98 | gpuarrays/interface (8) | 1.63 | 0.10 | 6.2 | 356.41 | 757.48 | From worker 11: From worker 11: [49631] signal (10.1): Bus error: 10 From worker 11: in expression starting at none:1 From worker 11: jl_gc_pool_alloc_noinline at /Users/christian/.julia/juliaup/julia-1.9.0-rc1+0.aarch64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line) From worker 11: jl_init_root_task at /Users/christian/.julia/juliaup/julia-1.9.0-rc1+0.aarch64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line) From worker 11: ijl_adopt_thread at /Users/christian/.julia/juliaup/julia-1.9.0-rc1+0.aarch64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line) From worker 11: unknown function (ip: 0x146c5829b) From worker 11: MTLDispatchListApply at /System/Library/Frameworks/Metal.framework/Versions/A/Metal (unknown line) From worker 11: Allocations: 30336539 (Pool: 30312481; Big: 24058); GC: 42 gpuarrays/math/power (6) | 19.73 | 1.15 | 5.8 | 3674.44 | 867.89 | gpuarrays/indexing find (2) | 17.20 | 1.05 | 6.1 | 4075.73 | 846.39 | gpuarrays/reductions/any all count (8) | 9.87 | 0.48 | 4.8 | 2516.14 | 816.03 | gpuarrays/uniformscaling (2) | 4.58 | 0.10 | 2.2 | 641.02 | 875.16 | gpuarrays/indexing multidimensional (11) | 18.18 | 0.69 | 3.8 | 3323.47 | 716.05 | gpuarrays/math/intrinsics (11) | 1.60 | 0.03 | 1.9 | 275.58 | 729.98 | examples (3) | 37.20 | 0.00 | 0.0 | 11.04 | 420.95 | gpuarrays/linalg/mul!/vector-matrix (9) | 31.19 | 0.90 | 2.9 | 5703.49 | 831.89 | gpuarrays/reductions/reducedim! (5) | 62.33 | 2.80 | 4.5 | 13218.14 | 1184.16 | gpuarrays/linalg (4) | 66.32 | 3.37 | 5.1 | 12616.14 | 1187.09 | From worker 4: ERROR: Exception handler triggered on unmanaged thread. From worker 4: From worker 4: [49494] signal (10.1): Bus error: 10 From worker 4: in expression starting at none:1 From worker 4: unknown function (ip: 0x128630218) From worker 4: MTLDispatchListApply at /System/Library/Frameworks/Metal.framework/Versions/A/Metal (unknown line) From worker 4: Allocations: 225362435 (Pool: 225197595; Big: 164840); GC: 335 From worker 4: ERROR: Exception handler triggered on unmanaged thread. gpuarrays/random (4) | failed at 2023-03-17T15:58:04.903 Worker 4 terminated. gpuarrays/linalg/norm (11) | 41.31 | 2.24 | 5.4 | 7506.83 | 1086.61 | Unhandled Task ERROR: EOFError: read end of file Stacktrace: [1] (::Base.var"#wait_locked#715")(s::Sockets.TCPSocket, buf::IOBuffer, nb::Int64) @ Base ./stream.jl:947 [2] unsafe_read(s::Sockets.TCPSocket, p::Ptr{UInt8}, nb::UInt64) @ Base ./stream.jl:955 [3] unsafe_read @ ./io.jl:761 [inlined] [4] unsafe_read(s::Sockets.TCPSocket, p::Base.RefValue{NTuple{4, Int64}}, n::Int64) @ Base ./io.jl:760 [5] read! @ ./io.jl:762 [inlined] [6] deserialize_hdr_raw @ ~/.julia/juliaup/julia-1.9.0-rc1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.9/Distributed/src/messages.jl:167 [inlined] [7] message_handler_loop(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool) @ Distributed ~/.julia/juliaup/julia-1.9.0-rc1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:172 [8] process_tcp_streams(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool) @ Distributed ~/.julia/juliaup/julia-1.9.0-rc1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:133 [9] (::Distributed.var"#103#104"{Sockets.TCPSocket, Sockets.TCPSocket, Bool})() @ Distributed ./task.jl:514 gpuarrays/constructors (5) | 15.55 | 0.44 | 2.8 | 2028.87 | 1320.22 | gpuarrays/statistics (3) | 43.41 | 2.44 | 5.6 | 8380.33 | 943.16 | gpuarrays/reductions/mapreducedim!_large (8) | 51.65 | 0.94 | 1.8 | 8317.50 | 1851.16 | gpuarrays/linalg/mul!/matrix-matrix (2) | 66.42 | 1.90 | 2.9 | 8863.59 | 1153.73 | gpuarrays/base (12) | 41.14 | 2.06 | 5.0 | 5157.23 | 898.88 | gpuarrays/reductions/== isequal (11) | failed at 2023-03-17T15:59:43.201 gpuarrays/reductions/mapreducedim! (3) | 125.84 | 4.65 | 3.7 | 18023.77 | 1261.88 | gpuarrays/reductions/minimum maximum extrema (6) | 191.01 | 9.40 | 4.9 | 25877.69 | 1455.48 | gpuarrays/reductions/reduce (8) | 152.66 | 6.82 | 4.5 | 18878.99 | 1851.16 | gpuarrays/reductions/sum prod (2) | 144.71 | 6.61 | 4.6 | 19135.49 | 1581.91 | gpuarrays/broadcasting (5) | 165.64 | 9.12 | 5.5 | 19953.40 | 1446.86 | gpuarrays/reductions/mapreduce (9) | 227.58 | 8.55 | 3.8 | 30215.41 | 1343.83 | Testing finished in 4 minutes, 31 seconds, 848 milliseconds gpuarrays/random: Error During Test at none:1 Got exception outside of a @test ProcessExitedException(4) Test Summary: | Pass Error Total Time Overall | 5644 7 5651 metal | 127 127 execution | 17 17 mps | 5 5 gpuarrays/indexing scalar | 398 398 profiling | 22 22 array | 60 60 device/intrinsics | 25 25 gpuarrays/interface | 7 7 gpuarrays/math/power | 60 60 gpuarrays/indexing find | 45 45 gpuarrays/reductions/any all count | 101 101 gpuarrays/uniformscaling | 56 56 gpuarrays/indexing multidimensional | 42 42 gpuarrays/math/intrinsics | 10 10 examples | 3 3 gpuarrays/linalg/mul!/vector-matrix | 140 140 gpuarrays/reductions/reducedim! | 160 160 gpuarrays/linalg | 233 233 gpuarrays/random | 1 1 gpuarrays/linalg/norm | 264 264 gpuarrays/constructors | 770 770 gpuarrays/statistics | 52 52 gpuarrays/reductions/mapreducedim!_large | 40 40 gpuarrays/linalg/mul!/matrix-matrix | 360 360 gpuarrays/base | 73 73 gpuarrays/reductions/== isequal | 242 6 248 gpuarrays/reductions/mapreducedim! | 260 260 gpuarrays/reductions/minimum maximum extrema | 555 555 gpuarrays/reductions/reduce | 220 220 gpuarrays/reductions/sum prod | 636 636 gpuarrays/broadcasting | 331 331 gpuarrays/reductions/mapreduce | 330 330 FAILURE Error in testset gpuarrays/random: Error During Test at none:1 Got exception outside of a @test ProcessExitedException(4) ```

maleadt commented 1 year ago

From worker 11: [49631] signal (10.1): Bus error: 10 From worker 11: in expression starting at none:1 From worker 11: jl_gc_pool_alloc_noinline at /Users/christian/.julia/juliaup/julia-1.9.0-rc1+0.aarch64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line) From worker 11: jl_init_root_task at /Users/christian/.julia/juliaup/julia-1.9.0-rc1+0.aarch64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line) From worker 11: ijl_adopt_thread at /Users/christian/.julia/juliaup/julia-1.9.0-rc1+0.aarch64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line) From worker 11: unknown function (ip: 0x146c5829b) From worker 11: MTLDispatchListApply at /System/Library/Frameworks/Metal.framework/Versions/A/Metal (unknown line) From worker 11: Allocations: 30336539 (Pool: 30312481; Big: 24058); GC: 42

Hmm, that one I had encountered before switching to a callback that doesn't execute on an unmanaged thread, https://github.com/JuliaGPU/Metal.jl/commit/2925d5becdb9baec79b2c602d20862edf50e2946. It'd be too bad we have to switch back to that, but it also wouldn't really sacrifice any functionality. You can test that out by disabling this piece of code (that's currently only enabled on 1.9): https://github.com/JuliaGPU/Metal.jl/blob/8f51c87c34654f4f1b22822b41b767d046e3e2e9/lib/mtl/command_buf.jl#L171-L190

Let me know if that reveals anything else.

christiangnrd commented 1 year ago

Disabling that code doesn't seem to have made a difference. I can pretty consistently get the error by just running the tests simultaneously on more than one terminal window.

I don't know if this would even translate to errors when just using the package.

pitsianis commented 1 year ago

Is there any progress on this? Is there a way for an ignorant outsider like me to help?

maleadt commented 1 year ago

Do you have a good reproducer? Trying to minimize that would be valuable.

pitsianis commented 1 year ago

This is an unorthodox setup, it has the same include twice. But

1) it is less than 50 lines. 2) It seems to fail from both the REPL and command line, on every other run.

Run with julia -tauto --project=. unmanaged-thread-bug.jl

# unmanaged-thread-bug.jl file
include("metal-bitonic.jl")
include("metal-bitonic.jl")

And the second file

using Metal

@inbounds function mtlxchange!(a, jj, kk)
    i = thread_position_in_grid_1d() - 1

    ij = i ⊻ jj
    if ij > i
        if (i & kk) == 0 && a[i+1] > a[ij+1]
            a[i+1], a[ij+1] = a[ij+1], a[i+1]
        end
        if (i & kk) != 0 && a[i+1] < a[ij+1]
            a[i+1], a[ij+1] = a[ij+1], a[i+1]
        end
    end
    nothing
end

function mtlbitonic!(a)
    n = length(a)
    q = Int(log2(n))
    for k = 1:q
        kk = 1 << k
        for j = k-1:-1:0
            jj = 1 << j
            @metal threads = 2^10 groups = n ÷ 2^10 mtlxchange!(a, jj, kk)
        end
    end
    nothing
end

function test(n)

    for k = 1:4
        a0 = rand(Float32, n)
        b = MtlArray(similar(a0))
        copyto!(b, a0)
        mtlbitonic!(b)
        @assert Array(b) == sort(a0)
    end
end

for k = 1:100
    test(2^22)
end
vchuravy commented 1 year ago

~That looks like you potentially hit a GC safepoint?~

maleadt commented 1 year ago

That looks like you potentially hit a GC safepoint?

That should just work on an adopted thread, right?

I'm not sure it's a safepoint, though. Most of the times the backtrace is truncated, but sometimes it points to alloc:

    From worker 11: [49631] signal (10.1): Bus error: 10
    From worker 11: in expression starting at none:1
    From worker 11: jl_gc_pool_alloc_noinline at /Users/christian/.julia/juliaup/julia-1.9.0-rc1+0.aarch64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line)
    From worker 11: jl_init_root_task at /Users/christian/.julia/juliaup/julia-1.9.0-rc1+0.aarch64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line)
    From worker 11: ijl_adopt_thread at /Users/christian/.julia/juliaup/julia-1.9.0-rc1+0.aarch64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line)
    From worker 11: unknown function (ip: 0x146c5829b)
    From worker 11: MTLDispatchListApply at /System/Library/Frameworks/Metal.framework/Versions/A/Metal (unknown line)
    From worker 11: Allocations: 30336539 (Pool: 30312481; Big: 24058); GC: 42

I'm not sure what a bus error on macOS during malloc signifies though. Or maybe there's an issue with thread adoption, since jl_gc_pool_alloc_inner accesses quite some task-local storage.

maleadt commented 1 year ago

@pitsianis I can't reproduce the issue with your MWE. If you're familiar with lldb, could you maybe run Julia under it and see if you can get some more information, even if only a better backtrace? Note that I should have worked around this issue on Metal.jl#master, so make sure you're using an older version.

pitsianis commented 1 year ago

This is the complete output from the command line

julia --project=. bug.jl 
ERROR: Exception handler triggered on unmanaged thread.

[20749] signal (10.2): Bus error: 10
in expression starting at /Users/nikos/projects/quicksort.jl/bug.jl:53
unknown function (ip: 0x11dff22e5)
MTLDispatchListApply at /System/Library/Frameworks/Metal.framework/Versions/A/Metal (unknown line)
-[_MTLCommandBuffer didCompleteWithStartTime:endTime:error:] at /System/Library/Frameworks/Metal.framework/Versions/A/Metal (unknown line)
-[IOGPUMetalCommandBuffer didCompleteWithStartTime:endTime:error:] at /System/Library/PrivateFrameworks/IOGPU.framework/Versions/A/IOGPU (unknown line)
-[_MTLCommandQueue commandBufferDidComplete:startTime:completionTime:error:] at /System/Library/Frameworks/Metal.framework/Versions/A/Metal (unknown line)
IOGPUNotificationQueueDispatchAvailableCompletionNotifications at /System/Library/PrivateFrameworks/IOGPU.framework/Versions/A/IOGPU (unknown line)
__IOGPUNotificationQueueSetDispatchQueue_block_invoke at /System/Library/PrivateFrameworks/IOGPU.framework/Versions/A/IOGPU (unknown line)
_dispatch_client_callout4 at /usr/lib/system/libdispatch.dylib (unknown line)
_dispatch_mach_msg_invoke at /usr/lib/system/libdispatch.dylib (unknown line)
_dispatch_lane_serial_drain at /usr/lib/system/libdispatch.dylib (unknown line)
_dispatch_mach_invoke at /usr/lib/system/libdispatch.dylib (unknown line)
_dispatch_lane_serial_drain at /usr/lib/system/libdispatch.dylib (unknown line)
_dispatch_lane_invoke at /usr/lib/system/libdispatch.dylib (unknown line)
_dispatch_lane_serial_drain at /usr/lib/system/libdispatch.dylib (unknown line)
_dispatch_lane_invoke at /usr/lib/system/libdispatch.dylib (unknown line)
_dispatch_workloop_worker_thread at /usr/lib/system/libdispatch.dylib (unknown line)
_pthread_wqthread at /usr/lib/system/libsystem_pthread.dylib (unknown line)
Allocations: 22532252 (Pool: 22514907; Big: 17345); GC: 151
ERROR: Exception handler triggered on unmanaged thread.
fish: Job 1, 'julia --project=. bug.jl' terminated by signal SIGBUS (Misaligned address error)
pitsianis commented 1 year ago

These is an attempt to use lldb. Please provide explicit instructions and I will post the output here.

(lldb) run
Process 20998 launched: '/usr/local/bin/julia' (x86_64)
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.9.0 (2023-05-07)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

warning: (x86_64) /Users/nikos/.julia/compiled/v1.9/Pipe/OQTzN_WpMrE.dylib empty dSYM file detected, dSYM was created with an executable with no debug info.
warning: (x86_64) /Users/nikos/.julia/compiled/v1.9/JLFzf/dM4lv_db1MR.dylib empty dSYM file detected, dSYM was created with an executable with no debug info.
(@v1.9) pkg> activate .
  Activating project at `~/projects/quicksort.jl`

julia> include("bug.jl")
warning: (x86_64) /Users/nikos/.julia/compiled/v1.9/Reexport/bTpYr_wyYDs.dylib empty dSYM file detected, dSYM was created with an executable with no debug info.
warning: (x86_64) /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/compiled/v1.9/Statistics/ERcPL_UQRSp.dylib empty dSYM file detected, dSYM was created with an executable with no debug info.
warning: (x86_64) /Users/nikos/.julia/compiled/v1.9/Scratch/ICI1U_wyYDs.dylib empty dSYM file detected, dSYM was created with an executable with no debug info.
warning: (x86_64) /Users/nikos/.julia/compiled/v1.9/StaticArraysCore/Tzw28_wyYDs.dylib empty dSYM file detected, dSYM was created with an executable with no debug info.
warning: (x86_64) /Users/nikos/.julia/compiled/v1.9/AdaptStaticArraysExt/9bCdf_db1MR.dylib empty dSYM file detected, dSYM was created with an executable with no debug info.
Process 20998 stopped
* thread #11, queue = 'com.Metal.CompletionQueueDispatch', stop reason = EXC_BAD_ACCESS (code=2, address=0x108674008)
    frame #0: 0x000000010f2026e5
->  0x10f2026e5: movq   (%rax), %rax
    0x10f2026e8: jmp    0x10f202700
    0x10f2026ea: xorl   %ebp, %ebp
    0x10f2026ec: jmp    0x10f202700
Target 0: (julia) stopped.
maleadt commented 1 year ago

Can you show the output of bt all?

maleadt commented 1 year ago

Also, please try the above after starting julia with ENABLE_GDBLISTENER=1 set in your environment.

EDIT: this also needs a lldb flag, so do:

$ lldb julia
(lldb) settings set plugin.jit-loader.gdb.enable on
(lldb) run
...
(lldb) bt all
pitsianis commented 1 year ago
julia> include("bug.jl")
warning: (x86_64) /Users/nikos/.julia/compiled/v1.9/Reexport/bTpYr_wyYDs.dylib empty dSYM file detected, dSYM was created with an executable with no debug info.
warning: (x86_64) /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/compiled/v1.9/Statistics/ERcPL_UQRSp.dylib empty dSYM file detected, dSYM was created with an executable with no debug info.
warning: (x86_64) /Users/nikos/.julia/compiled/v1.9/Scratch/ICI1U_wyYDs.dylib empty dSYM file detected, dSYM was created with an executable with no debug info.
warning: (x86_64) /Users/nikos/.julia/compiled/v1.9/StaticArraysCore/Tzw28_wyYDs.dylib empty dSYM file detected, dSYM was created with an executable with no debug info.
warning: (x86_64) /Users/nikos/.julia/compiled/v1.9/AdaptStaticArraysExt/9bCdf_db1MR.dylib empty dSYM file detected, dSYM was created with an executable with no debug info.
2023-05-22 19:00:30.748403+0300 julia[21531:431479] flock failed to lock list file (/var/folders/t8/rp60vdfx4d95cj919rp1ylmh0000gn/C//com.apple.metal/16777235_434/functions.list): errno = 35
2023-05-22 19:00:30.750347+0300 julia[21531:431479] Errors found! Invalidating cache...
Process 21531 stopped
* thread #10, queue = 'com.Metal.CompletionQueueDispatch', stop reason = EXC_BAD_ACCESS (code=2, address=0x108954008)
    frame #0: 0x0000000116878af5
->  0x116878af5: movq   (%rax), %rax
    0x116878af8: jmp    0x116878b10
    0x116878afa: xorl   %ebp, %ebp
    0x116878afc: jmp    0x116878b10
Target 0: (julia) stopped.
(lldb) bt all
  thread #1, queue = 'com.apple.main-thread'
    frame #0: 0x0000000109865bd0 libjulia-internal.1.9.dylib`gc_mark_loop + 432
    frame #1: 0x0000000109869ab8 libjulia-internal.1.9.dylib`_jl_gc_collect + 2296
    frame #2: 0x000000010986911b libjulia-internal.1.9.dylib`ijl_gc_collect + 395
    frame #3: 0x00000001098653b9 libjulia-internal.1.9.dylib`jl_gc_pool_alloc_inner + 41
    frame #4: 0x000000010986535f libjulia-internal.1.9.dylib`ijl_gc_pool_alloc + 15
    frame #5: 0x000000011682c98d
    frame #6: 0x000000017c8359ee yPwef_wyYDs.dylib`julia_LLVMType_3633 at type.jl:38
    frame #7: 0x000000017c85021e yPwef_wyYDs.dylib`julia_iterate_3032 at type.jl:244
    frame #8: 0x000000017c8570e5 yPwef_wyYDs.dylib`julia_isghosttype_3007 at type.jl:248
    frame #9: 0x000000017c863c36 yPwef_wyYDs.dylib`japi1_YY.isghosttypeYY.3_2993 at base.jl:131
    frame #10: 0x000000011687a431
  thread #2
    frame #0: 0x00007ff80d9e51ee libsystem_kernel.dylib`kevent + 10
    frame #1: 0x000000010987718b libjulia-internal.1.9.dylib`signal_listener + 667
    frame #2: 0x00007ff80da1f1d3 libsystem_pthread.dylib`_pthread_start + 125
    frame #3: 0x00007ff80da1abd3 libsystem_pthread.dylib`thread_start + 15
  thread #3
    frame #0: 0x00007ff80d9e05b2 libsystem_kernel.dylib`mach_msg2_trap + 10
    frame #1: 0x00007ff80d9ee72d libsystem_kernel.dylib`mach_msg2_internal + 78
    frame #2: 0x00007ff80d9e75e4 libsystem_kernel.dylib`mach_msg_overwrite + 692
    frame #3: 0x00007ff80d9e6b13 libsystem_kernel.dylib`mach_msg_server + 308
    frame #4: 0x0000000109875e5d libjulia-internal.1.9.dylib`mach_segv_listener + 29
    frame #5: 0x00007ff80da1f1d3 libsystem_pthread.dylib`_pthread_start + 125
    frame #6: 0x00007ff80da1abd3 libsystem_pthread.dylib`thread_start + 15
  thread #4
    frame #0: 0x00007ff80d9e30ee libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007ff80da1f758 libsystem_pthread.dylib`_pthread_cond_wait + 1242
    frame #2: 0x0000000138d4e4ef libopenblas64_.0.3.21.dylib`blas_thread_server + 207
    frame #3: 0x00007ff80da1f1d3 libsystem_pthread.dylib`_pthread_start + 125
    frame #4: 0x00007ff80da1abd3 libsystem_pthread.dylib`thread_start + 15
  thread #5
    frame #0: 0x00007ff80d9e30ee libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007ff80da1f758 libsystem_pthread.dylib`_pthread_cond_wait + 1242
    frame #2: 0x0000000138d4e4ef libopenblas64_.0.3.21.dylib`blas_thread_server + 207
    frame #3: 0x00007ff80da1f1d3 libsystem_pthread.dylib`_pthread_start + 125
    frame #4: 0x00007ff80da1abd3 libsystem_pthread.dylib`thread_start + 15
  thread #6
    frame #0: 0x00007ff80d9e30ee libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007ff80da1f758 libsystem_pthread.dylib`_pthread_cond_wait + 1242
    frame #2: 0x0000000138d4e4ef libopenblas64_.0.3.21.dylib`blas_thread_server + 207
    frame #3: 0x00007ff80da1f1d3 libsystem_pthread.dylib`_pthread_start + 125
    frame #4: 0x00007ff80da1abd3 libsystem_pthread.dylib`thread_start + 15
  thread #7
    frame #0: 0x00007ff80d9e30ee libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007ff80da1f758 libsystem_pthread.dylib`_pthread_cond_wait + 1242
    frame #2: 0x0000000138d4e4ef libopenblas64_.0.3.21.dylib`blas_thread_server + 207
    frame #3: 0x00007ff80da1f1d3 libsystem_pthread.dylib`_pthread_start + 125
    frame #4: 0x00007ff80da1abd3 libsystem_pthread.dylib`thread_start + 15
  thread #8
    frame #0: 0x00007ff80d9e30ee libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007ff80da1f758 libsystem_pthread.dylib`_pthread_cond_wait + 1242
    frame #2: 0x0000000138d4e4ef libopenblas64_.0.3.21.dylib`blas_thread_server + 207
    frame #3: 0x00007ff80da1f1d3 libsystem_pthread.dylib`_pthread_start + 125
    frame #4: 0x00007ff80da1abd3 libsystem_pthread.dylib`thread_start + 15
* thread #10, queue = 'com.Metal.CompletionQueueDispatch', stop reason = EXC_BAD_ACCESS (code=2, address=0x108954008)
  * frame #0: 0x0000000116878af5
  thread #11, queue = 'com.Metal.CommandQueueDispatch'
    frame #0: 0x00007ff80d9e05b2 libsystem_kernel.dylib`mach_msg2_trap + 10
    frame #1: 0x00007ff80d9ee72d libsystem_kernel.dylib`mach_msg2_internal + 78
    frame #2: 0x00007ff810991b7a IOKit`io_connect_method + 405
    frame #3: 0x00007ff810991981 IOKit`IOConnectCallMethod + 186
    frame #4: 0x00007ff82bcc8c35 IOGPU`IOGPUCommandQueueSubmitCommandBuffers + 181
    frame #5: 0x00007ff82bcb9310 IOGPU`-[IOGPUMetalCommandQueue _submitCommandBuffers:count:] + 502
    frame #6: 0x00007ff82bcb90f7 IOGPU`-[IOGPUMetalCommandQueue submitCommandBuffers:count:] + 65
    frame #7: 0x00007ff8175aac86 Metal`-[_MTLCommandQueue _submitAvailableCommandBuffers] + 794
    frame #8: 0x00007ff80d87e033 libdispatch.dylib`_dispatch_client_callout + 8
    frame #9: 0x00007ff80d880b65 libdispatch.dylib`_dispatch_continuation_pop + 463
    frame #10: 0x00007ff80d8927af libdispatch.dylib`_dispatch_source_invoke + 2184
    frame #11: 0x00007ff80d884088 libdispatch.dylib`_dispatch_lane_serial_drain + 393
    frame #12: 0x00007ff80d884d39 libdispatch.dylib`_dispatch_lane_invoke + 366
    frame #13: 0x00007ff80d88f3fc libdispatch.dylib`_dispatch_workloop_worker_thread + 765
    frame #14: 0x00007ff80da1bc55 libsystem_pthread.dylib`_pthread_wqthread + 327
    frame #15: 0x00007ff80da1abbf libsystem_pthread.dylib`start_wqthread + 15
maleadt commented 1 year ago

Oh interesting, thread 1 is running GC here when thread 10 (what I presume is the adopted thread here) throws a memory error.

christiangnrd commented 1 year ago

I was able to reproduce this in Metal 0.4.

maleadt commented 1 year ago

I was able to reproduce this in Metal 0.4.

That is surprising, as with https://github.com/JuliaGPU/Metal.jl/pull/184 our callbacks shouldn't cause thread adoption. Maybe it happens automatically though, i.e., even if the callback doesn't need the runtime (as it just does calls libuv) it still tries to adopt the thread. @vchuravy do you know?

maleadt commented 1 year ago

With that insight, a reproducer:

using Metal

# taken from GCBenchmarks
mutable struct ListNode
  key::Int64
  next::ListNode
  ListNode() = new()
  ListNode(x)= new(x)
  ListNode(x,y) = new(x,y);
end
function list(n=128)
    start::ListNode = ListNode(1)
    current::ListNode = start
    for i = 2:(n*1024^2)
        current = ListNode(i,current)
    end
    return current.key
end

function main()
    println("Creating garbage")
    x = list()
    @time GC.gc(true)

    # launch a kernel that will schedule a callback from a foreign thread
    @metal identity(nothing)

    # invoke the GC so that the callback happens during GC
    GC.gc(true)
end

main()

Running under lldb shows the same:

  thread #1, queue = 'com.apple.main-thread'
    frame #0: 0x0000000101862f44 libjulia-internal.1.9.dylib`gc_mark_loop + 5892
    frame #1: 0x0000000101865228 libjulia-internal.1.9.dylib`_jl_gc_collect + 1944
    frame #2: 0x00000001018649f0 libjulia-internal.1.9.dylib`ijl_gc_collect + 436
    frame #3: 0x000000015104046c JIT(0x151038000) at gcutils.jl:98
    frame #4: 0x00000001510404c8 JIT(0x151038000)
    frame #5: 0x000000010181988c libjulia-internal.1.9.dylib`ijl_apply_generic + 1732
    frame #6: 0x00000001018327b4 libjulia-internal.1.9.dylib`do_call + 188
    frame #7: 0x0000000101830fd8 libjulia-internal.1.9.dylib`eval_body + 1476
    frame #8: 0x0000000101831624 libjulia-internal.1.9.dylib`jl_interpret_toplevel_thunk + 260
    frame #9: 0x0000000101848794 libjulia-internal.1.9.dylib`jl_toplevel_eval_flex + 4620
    frame #10: 0x00000001018486b8 libjulia-internal.1.9.dylib`jl_toplevel_eval_flex + 4400
    frame #11: 0x00000001018494dc libjulia-internal.1.9.dylib`ijl_toplevel_eval_in + 156
    frame #12: 0x0000000124550798 sys.dylib`japi1_include_string_50563.clone_3 at boot.jl:370
    frame #13: 0x000000010181988c libjulia-internal.1.9.dylib`ijl_apply_generic + 1732
    frame #14: 0x00000001233390d8 sys.dylib`japi1__include_36508 at loading.jl:1924
    frame #15: 0x000000012333921c sys.dylib`julia_include_48873 at Base.jl:457
    frame #16: 0x0000000123339234 sys.dylib`jfptr_include_48874 + 12
    frame #17: 0x000000010181988c libjulia-internal.1.9.dylib`ijl_apply_generic + 1732
    frame #18: 0x0000000122f03b3c sys.dylib`julia_exec_options_44519 at client.jl:307
    frame #19: 0x0000000122f03fbc sys.dylib`julia__start_48792 at client.jl:522
    frame #20: 0x0000000122f040b0 sys.dylib`jfptr__start_48793 + 8
    frame #21: 0x000000010181988c libjulia-internal.1.9.dylib`ijl_apply_generic + 1732
    frame #22: 0x0000000101870788 libjulia-internal.1.9.dylib`true_main + 192
    frame #23: 0x000000010187067c libjulia-internal.1.9.dylib`jl_repl_entrypoint + 180
    frame #24: 0x0000000100003f6c julia`main + 12
    frame #25: 0x00000001a19f3f28 dyld`start + 2236

* thread #9, queue = 'com.Metal.CompletionQueueDispatch', stop reason = EXC_BAD_ACCESS (code=2, address=0x1000e8008)
  * frame #0: 0x000000010186119c libjulia-internal.1.9.dylib`jl_gc_pool_alloc_inner + 52
    frame #1: 0x0000000101836f04 libjulia-internal.1.9.dylib`jl_init_root_task + 200
    frame #2: 0x000000010185c47c libjulia-internal.1.9.dylib`ijl_adopt_thread + 80
    frame #3: 0x0000000169df02ac JIT(0x169de8000)
    frame #4: 0x00000001ab1a0498 Metal`MTLDispatchListApply + 52
    frame #5: 0x00000001ab1a0894 Metal`-[_MTLCommandBuffer didCompleteWithStartTime:endTime:error:] + 524
    frame #6: 0x00000001bea45cc4 IOGPU`-[IOGPUMetalCommandBuffer didCompleteWithStartTime:endTime:error:] + 220
    frame #7: 0x00000001ab1a052c Metal`-[_MTLCommandQueue commandBufferDidComplete:startTime:completionTime:error:] + 108
    frame #8: 0x00000001bea4f190 IOGPU`IOGPUNotificationQueueDispatchAvailableCompletionNotifications + 128
    frame #9: 0x00000001bea4f29c IOGPU`__IOGPUNotificationQueueSetDispatchQueue_block_invoke + 64
    frame #10: 0x00000001a1b9c4c0 libdispatch.dylib`_dispatch_client_callout4 + 20
    frame #11: 0x00000001a1bb8ed8 libdispatch.dylib`_dispatch_mach_msg_invoke + 468
    frame #12: 0x00000001a1ba3960 libdispatch.dylib`_dispatch_lane_serial_drain + 372
    frame #13: 0x00000001a1bb9c24 libdispatch.dylib`_dispatch_mach_invoke + 448
    frame #14: 0x00000001a1ba3960 libdispatch.dylib`_dispatch_lane_serial_drain + 372
    frame #15: 0x00000001a1ba462c libdispatch.dylib`_dispatch_lane_invoke + 436
    frame #16: 0x00000001a1ba3960 libdispatch.dylib`_dispatch_lane_serial_drain + 372
    frame #17: 0x00000001a1ba45f8 libdispatch.dylib`_dispatch_lane_invoke + 384
    frame #18: 0x00000001a1baf244 libdispatch.dylib`_dispatch_workloop_worker_thread + 648
    frame #19: 0x00000001a1d48074 libsystem_pthread.dylib`_pthread_wqthread + 288

With debug info:

  thread #1, queue = 'com.apple.main-thread'
    frame #0: 0x0000000101e7ca8c libjulia-internal-debug.1.10.dylib`gc_heap_snapshot_record_array_edge(from=0x000000010731a8c0, to=0x000000016fdfaac0) at gc-heap-snapshot.h:68:1
    frame #1: 0x0000000101e749a8 libjulia-internal-debug.1.10.dylib`gc_mark_objarray(ptls=0x0000000100842800, obj_parent=0x000000010731a8c0, obj_begin=0x00000001068728e0, obj_end=0x0000000106872938, step=1, nptr=62) at gc.c:2014:17
    frame #2: 0x0000000101e7fe1c libjulia-internal-debug.1.10.dylib`gc_queue_remset at gc.c:2628:17
    frame #3: 0x0000000101e7eee0 libjulia-internal-debug.1.10.dylib`gc_queue_remset(ptls=0x0000000100842800, ptls2=0x0000000100842800) at gc.c:2984:9
    frame #4: 0x0000000101e78e2c libjulia-internal-debug.1.10.dylib`_jl_gc_collect(ptls=0x0000000100842800, collection=JL_GC_FULL) at gc.c:3196:17
    frame #5: 0x0000000101e789c8 libjulia-internal-debug.1.10.dylib`ijl_gc_collect(collection=JL_GC_FULL) at gc.c:3523:13
    frame #6: 0x0000000147dac488 JIT(0x147da4000) at gcutils.jl:129
    frame #7: 0x0000000147dac508 JIT(0x147da4000)
    frame #8: 0x0000000101df84b0 libjulia-internal-debug.1.10.dylib`_jl_invoke(F=0x000000014507d118, args=0x000000016fdfb6a8, nargs=0, mfunc=0x0000000107b1b170, world=30688) at gf.c:2888:23
    frame #9: 0x0000000101df8554 libjulia-internal-debug.1.10.dylib`ijl_apply_generic(F=0x000000014507d118, args=0x000000016fdfb6a8, nargs=0) at gf.c:3070:12
    frame #10: 0x0000000101e22168 libjulia-internal-debug.1.10.dylib`jl_apply(args=0x000000016fdfb6a0, nargs=1) at julia.h:1961:12
    frame #11: 0x0000000101e21e78 libjulia-internal-debug.1.10.dylib`do_call(args=0x00000001056efaf8, nargs=1, s=0x000000016fdfbc30) at interpreter.c:125:26
    frame #12: 0x0000000101e20410 libjulia-internal-debug.1.10.dylib`eval_value(e=0x000000010574db50, s=0x000000016fdfbc30) at interpreter.c:222:16
    frame #13: 0x0000000101e2170c libjulia-internal-debug.1.10.dylib`eval_stmt_value(stmt=0x000000010574db50, s=0x000000016fdfbc30) at interpreter.c:173:23
    frame #14: 0x0000000101e1f390 libjulia-internal-debug.1.10.dylib`eval_body(stmts=0x00000001056efa90, s=0x000000016fdfbc30, ip=0, toplevel=1) at interpreter.c:602:21
    frame #15: 0x0000000101e1fd04 libjulia-internal-debug.1.10.dylib`jl_interpret_toplevel_thunk(m=0x00000001268cfef0, src=0x0000000104745d10) at interpreter.c:760:21
    frame #16: 0x0000000101e4b684 libjulia-internal-debug.1.10.dylib`jl_toplevel_eval_flex(m=0x00000001268cfef0, e=0x0000000104259d50, fast=1, expanded=0) at toplevel.c:922:18
    frame #17: 0x0000000101e4b19c libjulia-internal-debug.1.10.dylib`jl_toplevel_eval_flex(m=0x00000001268cfef0, e=0x0000000104259e10, fast=1, expanded=0) at toplevel.c:865:19
    frame #18: 0x0000000101e4ced8 libjulia-internal-debug.1.10.dylib`ijl_toplevel_eval(m=0x00000001268cfef0, v=0x0000000104259e10) at toplevel.c:931:12
    frame #19: 0x0000000101e4d1e0 libjulia-internal-debug.1.10.dylib`ijl_toplevel_eval_in(m=0x00000001268cfef0, ex=0x0000000104259e10) at toplevel.c:981:13
    frame #20: 0x000000012255babc sys-debug.dylib`japi1_include_string_81225 at boot.jl:383
    frame #21: 0x0000000101dec8d4 libjulia-internal-debug.1.10.dylib`jl_fptr_args(f=0x0000000122fc78b0, args=0x000000016fdfc760, nargs=4, m=0x0000000123da2b90) at gf.c:2531:12
    frame #22: 0x0000000101df83c4 libjulia-internal-debug.1.10.dylib`_jl_invoke(F=0x0000000122fc78b0, args=0x000000016fdfc760, nargs=4, mfunc=0x0000000123da2b40, world=30638) at gf.c:2869:35
    frame #23: 0x0000000101df8554 libjulia-internal-debug.1.10.dylib`ijl_apply_generic(F=0x0000000122fc78b0, args=0x000000016fdfc760, nargs=4) at gf.c:3070:12
    frame #24: 0x00000001223ddbdc sys-debug.dylib`japi1__include_81232 at loading.jl:2040
    frame #25: 0x0000000121c075ac sys-debug.dylib`julia_include_48400 at Base.jl:488
    frame #26: 0x0000000121fda854 sys-debug.dylib`jfptr_include_48401 + 16
    frame #27: 0x0000000101df83c4 libjulia-internal-debug.1.10.dylib`_jl_invoke(F=0x0000000122948620, args=0x000000016fdfe1b8, nargs=2, mfunc=0x0000000122948890, world=30638) at gf.c:2869:35
    frame #28: 0x0000000101df8554 libjulia-internal-debug.1.10.dylib`ijl_apply_generic(F=0x0000000122948620, args=0x000000016fdfe1b8, nargs=2) at gf.c:3070:12
    frame #29: 0x0000000121b65e70 sys-debug.dylib`julia_exec_options_82744 at client.jl:307
    frame #30: 0x0000000121b66494 sys-debug.dylib`julia__start_82889 at client.jl:541
    frame #31: 0x0000000121f3b008 sys-debug.dylib`jfptr__start_82890 + 8
    frame #32: 0x0000000101df83c4 libjulia-internal-debug.1.10.dylib`_jl_invoke(F=0x0000000122914ad0, args=0x000000016fdfe930, nargs=0, mfunc=0x0000000122914960, world=30638) at gf.c:2869:35
    frame #33: 0x0000000101df8554 libjulia-internal-debug.1.10.dylib`ijl_apply_generic(F=0x0000000122914ad0, args=0x000000016fdfe930, nargs=0) at gf.c:3070:12
    frame #34: 0x0000000101e9103c libjulia-internal-debug.1.10.dylib`jl_apply(args=0x000000016fdfe928, nargs=1) at julia.h:1961:12
    frame #35: 0x0000000101e92498 libjulia-internal-debug.1.10.dylib`true_main(argc=1, argv=0x000000016fdfecc8) at jlapi.c:582:13
    frame #36: 0x0000000101e92338 libjulia-internal-debug.1.10.dylib`jl_repl_entrypoint(argc=1, argv=0x000000016fdfecb0) at jlapi.c:734:15
    frame #37: 0x000000010008afcc libjulia-debug.1.dylib`jl_load_repl(argc=4, argv=0x000000016fdfecb0) at loader_lib.c:563:12
    frame #38: 0x0000000100003f64 julia-debug`main(argc=4, argv=0x000000016fdfecb0) at loader_exe.c:58:15
    frame #39: 0x00000001a19f3f28 dyld`start + 2236

* thread #9, queue = 'com.Metal.CompletionQueueDispatch', stop reason = EXC_BAD_ACCESS (code=2, address=0x1000e0008)
  * frame #0: 0x0000000101e7a490 libjulia-internal-debug.1.10.dylib`maybe_collect(ptls=0x000000016b00a200) at gc.c:957:9
    frame #1: 0x0000000101e739d4 libjulia-internal-debug.1.10.dylib`jl_gc_pool_alloc_inner(ptls=0x000000016b00a200, pool_offset=1832, osize=400) at gc.c:1323:5
    frame #2: 0x0000000101e73cf8 libjulia-internal-debug.1.10.dylib`jl_gc_pool_alloc_noinline(ptls=0x000000016b00a200, pool_offset=1832, osize=400) at gc.c:1383:12
    frame #3: 0x0000000101e292b4 libjulia-internal-debug.1.10.dylib`jl_gc_alloc_(ptls=0x000000016b00a200, sz=376, ty=0x0000000126cd63d0) at julia_internal.h:467:13
    frame #4: 0x0000000101e295c8 libjulia-internal-debug.1.10.dylib`jl_init_root_task(ptls=0x000000016b00a200, stack_lo=0x000000017c25b000, stack_hi=0x000000017c25a300) at task.c:1637:33
    frame #5: 0x0000000101e6c9e0 libjulia-internal-debug.1.10.dylib`ijl_adopt_thread at threading.c:418:21
    frame #6: 0x000000016ba9c2b8 JIT(0x16ba94000)
    frame #7: 0x00000001a2513078 MetalTools`-[MTLToolsCommandBuffer invokeCompletedHandlers] + 100
    frame #8: 0x00000001a2513078 MetalTools`-[MTLToolsCommandBuffer invokeCompletedHandlers] + 100
    frame #9: 0x00000001ab1a0498 Metal`MTLDispatchListApply + 52
    frame #10: 0x00000001ab1a0894 Metal`-[_MTLCommandBuffer didCompleteWithStartTime:endTime:error:] + 524
    frame #11: 0x00000001bea45cc4 IOGPU`-[IOGPUMetalCommandBuffer didCompleteWithStartTime:endTime:error:] + 220
    frame #12: 0x00000001ab1a052c Metal`-[_MTLCommandQueue commandBufferDidComplete:startTime:completionTime:error:] + 108
    frame #13: 0x00000001bea4f190 IOGPU`IOGPUNotificationQueueDispatchAvailableCompletionNotifications + 128
    frame #14: 0x00000001bea4f29c IOGPU`__IOGPUNotificationQueueSetDispatchQueue_block_invoke + 64
    frame #15: 0x00000001a1b9c4c0 libdispatch.dylib`_dispatch_client_callout4 + 20
    frame #16: 0x00000001a1bb8ed8 libdispatch.dylib`_dispatch_mach_msg_invoke + 468
    frame #17: 0x00000001a1ba3960 libdispatch.dylib`_dispatch_lane_serial_drain + 372
    frame #18: 0x00000001a1bb9c24 libdispatch.dylib`_dispatch_mach_invoke + 448
    frame #19: 0x00000001a1ba3960 libdispatch.dylib`_dispatch_lane_serial_drain + 372
    frame #20: 0x00000001a1ba462c libdispatch.dylib`_dispatch_lane_invoke + 436
    frame #21: 0x00000001a1ba3960 libdispatch.dylib`_dispatch_lane_serial_drain + 372
    frame #22: 0x00000001a1ba45f8 libdispatch.dylib`_dispatch_lane_invoke + 384
    frame #23: 0x00000001a1baf244 libdispatch.dylib`_dispatch_workloop_worker_thread + 648
    frame #24: 0x00000001a1d48074 libsystem_pthread.dylib`_pthread_wqthread + 288
maleadt commented 1 year ago

Should be fixed in Julia 1.9.1, and on master of course.

christiangnrd commented 1 year ago

Should #184 be reverted? Just saw #191

maleadt commented 1 year ago

Correction, the fix sadly did not make it in time for 1.9.1, so it'll be part of 1.9.2.