Open jgreener64 opened 1 month ago
Hm that error looks like you’re not running with the enzyme cuda ext package. With it I get an errror in mapreduce!
In essence we just ought properly define the derivative kernel for that so I’d argue it’s more feature dev than bug
On Wed, Jul 31, 2024 at 12:49 PM Joe Greener @.***> wrote:
Describe the bug
Reductions with GPU broadcasting error with Enzyme. @wsmoses https://github.com/wsmoses suggested I open an issue here.
To reproduce
The Minimal Working Example (MWE) for this bug:
using Enzyme, CUDAf(x, y) = sum(x .+ y) x = CuArray(rand(5)) y = CuArray(rand(5)) dx = CuArray([1.0, 0.0, 0.0, 0.0, 0.0])autodiff(Reverse, f, Active, Duplicated(x, dx), Const(y))
ERROR: Enzyme execution failed. Enzyme compilation failed.
No create nofree of empty function (jl_gc_safe_enter) jl_gc_safe_enter) at context: call fastcc void @julia__launch_configuration_979_4373([2 x i64] noalias nocapture nofree noundef nonnull writeonly sret([2 x i64]) align 8 dereferenceable(16) %7, i64 noundef signext 0, { i64, {} addrspace(10) } addrspace(11)* nocapture nofree noundef nonnull readonly align 8 dereferenceable(32) %45) #715, !dbg !1090 (julia__launch_configuration_979_4373)
Stacktrace: [1] launch_configuration @ ~/.julia/dev/CUDA/lib/cudadrv/occupancy.jl:56 [2] #launch_heuristic#1204 @ ~/.julia/dev/CUDA/src/gpuarrays.jl:22 [3] launch_heuristic @ ~/.julia/dev/CUDA/src/gpuarrays.jl:15 [4] _copyto! @ ~/.julia/packages/GPUArrays/bbZD0/src/host/broadcast.jl:78 [5] copyto! @ ~/.julia/packages/GPUArrays/bbZD0/src/host/broadcast.jl:44 [6] copy @ ~/.julia/packages/GPUArrays/bbZD0/src/host/broadcast.jl:29 [7] materialize @ ./broadcast.jl:903 [8] f @ ./REPL[2]:1
Stacktrace: [1] throwerr(cstr::Cstring) @ Enzyme.Compiler ~/.julia/dev/Enzyme/src/compiler.jl:1797 [2] launch_configuration @ ~/.julia/dev/CUDA/lib/cudadrv/occupancy.jl:56 [inlined] [3] #launch_heuristic#1204 @ ~/.julia/dev/CUDA/src/gpuarrays.jl:22 [inlined] [4] launch_heuristic @ ~/.julia/dev/CUDA/src/gpuarrays.jl:15 [inlined] [5] _copyto! @ ~/.julia/packages/GPUArrays/bbZD0/src/host/broadcast.jl:78 [inlined] [6] copyto! @ ~/.julia/packages/GPUArrays/bbZD0/src/host/broadcast.jl:44 [inlined] [7] copy @ ~/.julia/packages/GPUArrays/bbZD0/src/host/broadcast.jl:29 [inlined] [8] materialize @ ./broadcast.jl:903 [inlined] [9] f @ ./REPL[2]:1 [inlined] [10] diffejulia_f_2820wrap @ ./REPL[2]:0 [11] macro expansion @ ~/.julia/dev/Enzyme/src/compiler.jl:6819 [inlined] [12] enzyme_call @ ~/.julia/dev/Enzyme/src/compiler.jl:6419 [inlined] [13] CombinedAdjointThunk @ ~/.julia/dev/Enzyme/src/compiler.jl:6296 [inlined] [14] autodiff @ ~/.julia/dev/Enzyme/src/Enzyme.jl:314 [inlined] [15] autodiff(::ReverseMode{…}, ::typeof(f), ::Type{…}, ::Duplicated{…}, ::Const{…}) @ Enzyme ~/.julia/dev/Enzyme/src/Enzyme.jl:326 [16] top-level scope @ REPL[6]:1 Some type information was truncated. Use
show(err)
to see complete types.Forward mode also fails. This is with Julia 1.10.3, Enzyme 0.12.26, GPUCompiler 0.26.7 and CUDA d7077da https://github.com/JuliaGPU/CUDA.jl/commit/d7077da2b7df32f9d0a2bced56511cdd778ab4ed .
Julia Version 1.10.3 Commit 0b4590a5507 (2024-04-30 10:59 UTC) Build Info: Official https://julialang.org/ release Platform Info: OS: Linux (x86_64-linux-gnu) CPU: 36 × Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-15.0.7 (ORCJIT, cascadelake) Threads: 18 default, 0 interactive, 9 GC (on 36 virtual cores) Environment: LD_LIBRARY_PATH = /usr/local/gromacs/lib
Details on CUDA:
UDA runtime 12.5, artifact installation CUDA driver 12.5 NVIDIA driver 535.183.1, originally for CUDA 12.2
CUDA libraries:
- CUBLAS: 12.5.3
- CURAND: 10.3.6
- CUFFT: 11.2.3
- CUSOLVER: 11.6.3
- CUSPARSE: 12.5.1
- CUPTI: 2024.2.1 (API 23.0.0)
- NVML: 12.0.0+535.183.1
Julia packages:
- CUDA: 5.4.3
- CUDA_Driver_jll: 0.9.1+1
- CUDA_Runtime_jll: 0.14.1+0
Toolchain:
- Julia: 1.10.3
- LLVM: 15.0.7
2 devices: 0: NVIDIA RTX A6000 (sm_86, 46.970 GiB / 47.988 GiB available) 1: NVIDIA RTX A6000 (sm_86, 4.046 GiB / 47.988 GiB available)
— Reply to this email directly, view it on GitHub https://github.com/JuliaGPU/CUDA.jl/issues/2455, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTUXEGBSVQW5FNKW2HA6DZPEIRNAVCNFSM6AAAAABLY3VPKGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ2DANBYGQYTEMA . You are receiving this because you were mentioned.Message ID: @.***>
I don't see how this is a CUDA.jl issue.
Sorry, I mentioned in the earlier issue in Enzyme.jl -- I recommended Joe open an issue here since I think the resolution is extending the Enzyme Cuda ext with a rule that says the derivative of https://github.com/JuliaGPU/CUDA.jl/blob/d7077da2b7df32f9d0a2bced56511cdd778ab4ed/src/mapreduce.jl#L169 is [corresponding derivative fn].
Fair enough! Hope you don't mind me assigning the issue to you then 🙂
Oh yeah for sure, kind of assumed that :P
Describe the bug
Reductions with GPU broadcasting error with Enzyme. @wsmoses suggested I open an issue here.
To reproduce
The Minimal Working Example (MWE) for this bug:
Forward mode also fails. This is with Julia 1.10.3, Enzyme 0.12.26, GPUCompiler 0.26.7 and CUDA d7077da2b7df32f9d0a2bced56511cdd778ab4ed.
Details on CUDA: