JuliaGPU / CUDA.jl

CUDA programming in Julia.
https://juliagpu.org/cuda/
Other
1.2k stars 215 forks source link

Method definitions break native rand! kernel #1508

Open sigmike opened 2 years ago

sigmike commented 2 years ago

Describe the bug

After a system and Julia update I can't train vgg_cifar10.jl from the Flux model zoo anymore. It generates this error:

ERROR: LoadError: InvalidIRError: compiling kernel rand!(CuDeviceVector{Float32, 1}, UInt32, UInt32) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to CUDA.Philox2x32{R}() where R in CUDA at ~/.julia/packages/CUDA/01uIm/src/device/random.jl:46)
Stacktrace (from the MWE below) ``` [1] Philox2x32 @ ~/.julia/packages/CUDA/01uIm/src/device/random.jl:62 [2] #default_rng @ ~/.julia/packages/CUDA/01uIm/src/device/random.jl:95 [3] kernel @ ~/.julia/packages/CUDA/01uIm/src/random.jl:39 Reason: unsupported dynamic function invocation (call to rand(rng::AbstractRNG, ::Type{X}) where X in Random at /usr/share/julia/stdlib/v1.7/Random/src/Random.jl:257) Stacktrace: [1] kernel @ ~/.julia/packages/CUDA/01uIm/src/random.jl:51 Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code Stacktrace: [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{CUDA.var"#kernel#356", Tuple{CuDeviceVector{Float32, 1}, UInt32, UInt32}}}, args::LLVM.Module) @ GPUCompiler ~/.julia/packages/GPUCompiler/EV8pB/src/validation.jl:139 [2] macro expansion @ ~/.julia/packages/GPUCompiler/EV8pB/src/driver.jl:391 [inlined] [3] macro expansion @ ~/.julia/packages/TimerOutputs/LDL7n/src/TimerOutput.jl:252 [inlined] [4] macro expansion @ ~/.julia/packages/GPUCompiler/EV8pB/src/driver.jl:389 [inlined] [5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType) @ GPUCompiler ~/.julia/packages/GPUCompiler/EV8pB/src/utils.jl:64 [6] cufunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context) @ CUDA ~/.julia/packages/CUDA/01uIm/src/compiler/execution.jl:337 [7] #260 @ ~/.julia/packages/CUDA/01uIm/src/compiler/execution.jl:330 [inlined] [8] JuliaContext(f::CUDA.var"#260#261"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{CUDA.var"#kernel#356", Tuple{CuDeviceVector{Float32, 1}, UInt32, UInt32}}}}) @ GPUCompiler ~/.julia/packages/GPUCompiler/EV8pB/src/driver.jl:74 [9] cufunction_compile(job::GPUCompiler.CompilerJob) @ CUDA ~/.julia/packages/CUDA/01uIm/src/compiler/execution.jl:329 [10] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link)) @ GPUCompiler ~/.julia/packages/GPUCompiler/EV8pB/src/cache.jl:90 [11] cufunction(f::CUDA.var"#kernel#356", tt::Type{Tuple{CuDeviceVector{Float32, 1}, UInt32, UInt32}}; name::String, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}) @ CUDA ~/.julia/packages/CUDA/01uIm/src/compiler/execution.jl:301 [12] macro expansion @ ~/.julia/packages/CUDA/01uIm/src/compiler/execution.jl:102 [inlined] [13] rand!(rng::CUDA.RNG, A::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}) @ CUDA ~/.julia/packages/CUDA/01uIm/src/random.jl:60 [14] top-level scope @ mwe.jl:12 in expression starting at mwe.jl:12 ```

To reproduce

I discussed that on discourse and was able to reduce the MWE to that:

using CUDA
using Random

struct X{T} end
Broadcast.broadcasted(f::X, args...) = map(f, args...)

struct Y <: Base.Broadcast.BroadcastStyle end
Base.Broadcast.BroadcastStyle(::Base.Broadcast.BroadcastStyle, ::Y) = Y()

a = CuArray{Float32}(undef, 2)
Random.rand!(CUDA.default_rng(), a)

But if the Random.rand! instruction is done first, then the error does not happen:

using CUDA
using Random

a = CuArray{Float32}(undef, 2)
Random.rand!(CUDA.default_rng(), a) # No error here

struct X{T} end
Broadcast.broadcasted(f::X, args...) = map(f, args...)

struct Y <: Base.Broadcast.BroadcastStyle end
Base.Broadcast.BroadcastStyle(::Base.Broadcast.BroadcastStyle, ::Y) = Y()

a = CuArray{Float32}(undef, 2)
Random.rand!(CUDA.default_rng(), a) # Nor here

The actual lines in the original packages that trigger the error are:

Manifest.toml

``` # This file is machine-generated - editing it directly is not advised julia_version = "1.7.2" manifest_format = "2.0" [[deps.AbstractFFTs]] deps = ["ChainRulesCore", "LinearAlgebra"] git-tree-sha1 = "6f1d9bc1c08f9f4a8fa92e3ea3cb50153a1b40d4" uuid = "621f4979-c628-5d54-868e-fcf4e3e8185c" version = "1.1.0" [[deps.Adapt]] deps = ["LinearAlgebra"] git-tree-sha1 = "af92965fb30777147966f58acb05da51c5616b5f" uuid = "79e6a3ab-5dfb-504d-930d-738a2a938a0e" version = "3.3.3" [[deps.ArgTools]] uuid = "0dad84c5-d112-42e6-8d28-ef12dabb789f" [[deps.Artifacts]] uuid = "56f22d72-fd6d-98f1-02f0-08ddc0907c33" [[deps.BFloat16s]] deps = ["LinearAlgebra", "Printf", "Random", "Test"] git-tree-sha1 = "a598ecb0d717092b5539dbbe890c98bac842b072" uuid = "ab4f0b2a-ad5b-11e8-123f-65d77653426b" version = "0.2.0" [[deps.Base64]] uuid = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f" [[deps.CEnum]] git-tree-sha1 = "eb4cb44a499229b3b8426dcfb5dd85333951ff90" uuid = "fa961155-64e5-5f13-b03f-caf6b980ea82" version = "0.4.2" [[deps.CUDA]] deps = ["AbstractFFTs", "Adapt", "BFloat16s", "CEnum", "CompilerSupportLibraries_jll", "ExprTools", "GPUArrays", "GPUCompiler", "LLVM", "LazyArtifacts", "Libdl", "LinearAlgebra", "Logging", "Printf", "Random", "Random123", "RandomNumbers", "Reexport", "Requires", "SparseArrays", "SpecialFunctions", "TimerOutputs"] git-tree-sha1 = "939e46b905e00ffd0dde147f06d13e6c82b5423b" repo-rev = "master" repo-url = "https://github.com/JuliaGPU/CUDA.jl.git" uuid = "052768ef-5323-5732-b1bb-66c8b64840ba" version = "3.9.2" [[deps.ChainRulesCore]] deps = ["Compat", "LinearAlgebra", "SparseArrays"] git-tree-sha1 = "9950387274246d08af38f6eef8cb5480862a435f" uuid = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4" version = "1.14.0" [[deps.ChangesOfVariables]] deps = ["ChainRulesCore", "LinearAlgebra", "Test"] git-tree-sha1 = "1e315e3f4b0b7ce40feded39c73049692126cf53" uuid = "9e997f8a-9a97-42d5-a9f1-ce6bfc15e2c0" version = "0.1.3" [[deps.Compat]] deps = ["Base64", "Dates", "DelimitedFiles", "Distributed", "InteractiveUtils", "LibGit2", "Libdl", "LinearAlgebra", "Markdown", "Mmap", "Pkg", "Printf", "REPL", "Random", "SHA", "Serialization", "SharedArrays", "Sockets", "SparseArrays", "Statistics", "Test", "UUIDs", "Unicode"] git-tree-sha1 = "b153278a25dd42c65abbf4e62344f9d22e59191b" uuid = "34da2185-b29b-5c13-b0c7-acf172513d20" version = "3.43.0" [[deps.CompilerSupportLibraries_jll]] deps = ["Artifacts", "Libdl"] uuid = "e66e0078-7015-5450-92f7-15fbd957f2ae" [[deps.Dates]] deps = ["Printf"] uuid = "ade2ca70-3891-5945-98fb-dc099432e06a" [[deps.DelimitedFiles]] deps = ["Mmap"] uuid = "8bb1440f-4735-579b-a4ab-409b98df4dab" [[deps.Distributed]] deps = ["Random", "Serialization", "Sockets"] uuid = "8ba89e20-285c-5b6f-9357-94700520ee1b" [[deps.DocStringExtensions]] deps = ["LibGit2"] git-tree-sha1 = "b19534d1895d702889b219c382a6e18010797f0b" uuid = "ffbed154-4ef7-542d-bbb7-c09d3a79fcae" version = "0.8.6" [[deps.Downloads]] deps = ["ArgTools", "LibCURL", "NetworkOptions"] uuid = "f43a241f-c20a-4ad4-852c-f6b1247861c6" [[deps.ExprTools]] git-tree-sha1 = "56559bbef6ca5ea0c0818fa5c90320398a6fbf8d" uuid = "e2ba6199-217a-4e67-a87a-7c52f15ade04" version = "0.1.8" [[deps.GPUArrays]] deps = ["Adapt", "LLVM", "LinearAlgebra", "Printf", "Random", "Serialization", "Statistics"] git-tree-sha1 = "c783e8883028bf26fb05ed4022c450ef44edd875" uuid = "0c68f7d7-f131-5f86-a1c3-88cf8149b2d7" version = "8.3.2" [[deps.GPUCompiler]] deps = ["ExprTools", "InteractiveUtils", "LLVM", "Libdl", "Logging", "TimerOutputs", "UUIDs"] git-tree-sha1 = "05374e47bb136db517b33f62fbe852adf8deb0be" uuid = "61eb1bfa-7361-4325-ad38-22787b887f55" version = "0.15.1" [[deps.InteractiveUtils]] deps = ["Markdown"] uuid = "b77e0a4c-d291-57a0-90e8-8db25a27a240" [[deps.InverseFunctions]] deps = ["Test"] git-tree-sha1 = "336cc738f03e069ef2cac55a104eb823455dca75" uuid = "3587e190-3f89-42d0-90ee-14403ec27112" version = "0.1.4" [[deps.IrrationalConstants]] git-tree-sha1 = "7fd44fd4ff43fc60815f8e764c0f352b83c49151" uuid = "92d709cd-6900-40b7-9082-c6be49f344b6" version = "0.1.1" [[deps.JLLWrappers]] deps = ["Preferences"] git-tree-sha1 = "abc9885a7ca2052a736a600f7fa66209f96506e1" uuid = "692b3bcd-3c85-4b1f-b108-f13ce0eb3210" version = "1.4.1" [[deps.LLVM]] deps = ["CEnum", "LLVMExtra_jll", "Libdl", "Printf", "Unicode"] git-tree-sha1 = "c8d47589611803a0f3b4813d9e267cd4e3dbcefb" uuid = "929cbde3-209d-540e-8aea-75f648917ca0" version = "4.11.1" [[deps.LLVMExtra_jll]] deps = ["Artifacts", "JLLWrappers", "LazyArtifacts", "Libdl", "Pkg", "TOML"] git-tree-sha1 = "771bfe376249626d3ca12bcd58ba243d3f961576" uuid = "dad2f222-ce93-54a1-a47d-0025e8a3acab" version = "0.0.16+0" [[deps.LazyArtifacts]] deps = ["Artifacts", "Pkg"] uuid = "4af54fe1-eca0-43a8-85a7-787d91b784e3" [[deps.LibCURL]] deps = ["LibCURL_jll", "MozillaCACerts_jll"] uuid = "b27032c2-a3e7-50c8-80cd-2d36dbcbfd21" [[deps.LibCURL_jll]] deps = ["Artifacts", "LibSSH2_jll", "Libdl", "MbedTLS_jll", "Zlib_jll", "nghttp2_jll"] uuid = "deac9b47-8bc7-5906-a0fe-35ac56dc84c0" [[deps.LibGit2]] deps = ["Base64", "NetworkOptions", "Printf", "SHA"] uuid = "76f85450-5226-5b5a-8eaa-529ad045b433" [[deps.LibSSH2_jll]] deps = ["Artifacts", "Libdl", "MbedTLS_jll"] uuid = "29816b5a-b9ab-546f-933c-edad1886dfa8" [[deps.Libdl]] uuid = "8f399da3-3557-5675-b5ff-fb832c97cbdb" [[deps.LinearAlgebra]] deps = ["Libdl", "libblastrampoline_jll"] uuid = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e" [[deps.LogExpFunctions]] deps = ["ChainRulesCore", "ChangesOfVariables", "DocStringExtensions", "InverseFunctions", "IrrationalConstants", "LinearAlgebra"] git-tree-sha1 = "09e4b894ce6a976c354a69041a04748180d43637" uuid = "2ab3a3ac-af41-5b50-aa03-7779005ae688" version = "0.3.15" [[deps.Logging]] uuid = "56ddb016-857b-54e1-b83d-db4d58db5568" [[deps.Markdown]] deps = ["Base64"] uuid = "d6f4376e-aef5-505a-96c1-9c027394607a" [[deps.MbedTLS_jll]] deps = ["Artifacts", "Libdl"] uuid = "c8ffd9c3-330d-5841-b78e-0817d7145fa1" [[deps.Mmap]] uuid = "a63ad114-7e13-5084-954f-fe012c677804" [[deps.MozillaCACerts_jll]] uuid = "14a3606d-f60d-562e-9121-12d972cd8159" [[deps.NetworkOptions]] uuid = "ca575930-c2e3-43a9-ace4-1e988b2c1908" [[deps.OpenBLAS_jll]] deps = ["Artifacts", "CompilerSupportLibraries_jll", "Libdl"] uuid = "4536629a-c528-5b80-bd46-f80d51c5b363" [[deps.OpenLibm_jll]] deps = ["Artifacts", "Libdl"] uuid = "05823500-19ac-5b8b-9628-191a04bc5112" [[deps.OpenSpecFun_jll]] deps = ["Artifacts", "CompilerSupportLibraries_jll", "JLLWrappers", "Libdl", "Pkg"] git-tree-sha1 = "13652491f6856acfd2db29360e1bbcd4565d04f1" uuid = "efe28fd5-8261-553b-a9e1-b2916fc3738e" version = "0.5.5+0" [[deps.Pkg]] deps = ["Artifacts", "Dates", "Downloads", "LibGit2", "Libdl", "Logging", "Markdown", "Printf", "REPL", "Random", "SHA", "Serialization", "TOML", "Tar", "UUIDs", "p7zip_jll"] uuid = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f" [[deps.Preferences]] deps = ["TOML"] git-tree-sha1 = "47e5f437cc0e7ef2ce8406ce1e7e24d44915f88d" uuid = "21216c6a-2e73-6563-6e65-726566657250" version = "1.3.0" [[deps.Printf]] deps = ["Unicode"] uuid = "de0858da-6303-5e67-8744-51eddeeeb8d7" [[deps.REPL]] deps = ["InteractiveUtils", "Markdown", "Sockets", "Unicode"] uuid = "3fa0cd96-eef1-5676-8a61-b3b8758bbffb" [[deps.Random]] deps = ["SHA", "Serialization"] uuid = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c" [[deps.Random123]] deps = ["Random", "RandomNumbers"] git-tree-sha1 = "afeacaecf4ed1649555a19cb2cad3c141bbc9474" uuid = "74087812-796a-5b5d-8853-05524746bad3" version = "1.5.0" [[deps.RandomNumbers]] deps = ["Random", "Requires"] git-tree-sha1 = "043da614cc7e95c703498a491e2c21f58a2b8111" uuid = "e6cf234a-135c-5ec9-84dd-332b85af5143" version = "1.5.3" [[deps.Reexport]] git-tree-sha1 = "45e428421666073eab6f2da5c9d310d99bb12f9b" uuid = "189a3867-3050-52da-a836-e630ba90ab69" version = "1.2.2" [[deps.Requires]] deps = ["UUIDs"] git-tree-sha1 = "838a3a4188e2ded87a4f9f184b4b0d78a1e91cb7" uuid = "ae029012-a4dd-5104-9daa-d747884805df" version = "1.3.0" [[deps.SHA]] uuid = "ea8e919c-243c-51af-8825-aaa63cd721ce" [[deps.Serialization]] uuid = "9e88b42a-f829-5b0c-bbe9-9e923198166b" [[deps.SharedArrays]] deps = ["Distributed", "Mmap", "Random", "Serialization"] uuid = "1a1011a3-84de-559e-8e89-a11a2f7dc383" [[deps.Sockets]] uuid = "6462fe0b-24de-5631-8697-dd941f90decc" [[deps.SparseArrays]] deps = ["LinearAlgebra", "Random"] uuid = "2f01184e-e22b-5df5-ae63-d93ebab69eaf" [[deps.SpecialFunctions]] deps = ["ChainRulesCore", "IrrationalConstants", "LogExpFunctions", "OpenLibm_jll", "OpenSpecFun_jll"] git-tree-sha1 = "5ba658aeecaaf96923dce0da9e703bd1fe7666f9" uuid = "276daf66-3868-5448-9aa4-cd146d93841b" version = "2.1.4" [[deps.Statistics]] deps = ["LinearAlgebra", "SparseArrays"] uuid = "10745b16-79ce-11e8-11f9-7d13ad32a3b2" [[deps.TOML]] deps = ["Dates"] uuid = "fa267f1f-6049-4f14-aa54-33bafae1ed76" [[deps.Tar]] deps = ["ArgTools", "SHA"] uuid = "a4e569a6-e804-4fa4-b0f3-eef7a1d5b13e" [[deps.Test]] deps = ["InteractiveUtils", "Logging", "Random", "Serialization"] uuid = "8dfed614-e22c-5e08-85e1-65c5234f0b40" [[deps.TimerOutputs]] deps = ["ExprTools", "Printf"] git-tree-sha1 = "7638550aaea1c9a1e86817a231ef0faa9aca79bd" uuid = "a759f4b9-e2f1-59dc-863e-4aeb61b1ea8f" version = "0.5.19" [[deps.UUIDs]] deps = ["Random", "SHA"] uuid = "cf7118a7-6976-5b1a-9a39-7adc72f591a4" [[deps.Unicode]] uuid = "4ec0a83e-493e-50e2-b9ac-8f72acf5a8f5" [[deps.Zlib_jll]] deps = ["Libdl"] uuid = "83775a58-1f1d-513f-b197-d71354ab007a" [[deps.libblastrampoline_jll]] deps = ["Artifacts", "Libdl", "OpenBLAS_jll"] uuid = "8e850b90-86db-534c-a0d3-1478176c7d93" [[deps.nghttp2_jll]] deps = ["Artifacts", "Libdl"] uuid = "8e850ede-7688-5339-a07c-302acd2aaf8d" [[deps.p7zip_jll]] deps = ["Artifacts", "Libdl"] uuid = "3f19e933-33d8-53b3-aaab-bd5110c3b7a0" ```

Expected behavior

No error.

Version info

Details on Julia:

Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)

Details on CUDA:

CUDA toolkit 11.7, artifact installation
NVIDIA driver 510.68.2, for CUDA 11.6
CUDA driver 11.6

Libraries:
- CUBLAS: 11.10.1
- CURAND: 10.2.10
- CUFFT: 10.7.2
- CUSOLVER: 11.3.5
- CUSPARSE: 11.7.3
- CUPTI: 17.0.0
- NVML: 11.0.0+510.68.2
- CUDNN: 8.30.2 (for CUDA 11.5.0)
- CUTENSOR: 1.4.0 (for CUDA 11.5.0)

Toolchain:
- Julia: 1.7.2
- LLVM: 12.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80

1 device:
  0: NVIDIA GeForce GTX 1080 (sm_61, 7.325 GiB / 8.000 GiB available)
maleadt commented 2 years ago

Thanks for the MWE. I'm not sure we can do much about this though; inference/compilation being dependent on additional methods being defined can be caused by an heuristic kicking in after a limit has been exceeded, or on 'bugs' like https://github.com/JuliaLang/julia/issues/35800. Here, the code pulled in by RandomNumbers.AbstractRNG seems to be pretty massive, relying on device-side broadcast which fails to compile statically after those method definitions. Simplifying that code might be one course of action.

That said, I also noticed that the code works again on 1.8, so maybe that's an easier solution?

stefanjwojcik commented 2 years ago

I believe I'm having the same issue with the DCGAN example in the Flux model zoo. I haven't yet been able to resolve it.

mossr commented 1 year ago

FYI, I had the same error when using Dropout as a layer in Julia v1.7 and it was resolved when switching to Julia v1.8