Open jgreener64 opened 1 year ago
@jgreener64 can you post the whole log?
I also cannot reproduce this on my system (1.8.1, NVIDIA 3090), latest Enzyme.jl and Enzyme proper.
I have attached the error.txt and
error with Enzyme.API.printall!(true)
since they are over the text box size limit.
My setup (Julia updated since the top post) is Enzyme e452f8932fc602989df23d96e5039a3268e5e965, Enzyme_jll 0.0.42, a NVIDIA RTX A6000 GPU and
Julia Version 1.8.2
Commit 36034abf260 (2022-09-29 15:21 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 36 × Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, cascadelake)
Threads: 16 on 36 virtual cores
Environment:
LD_LIBRARY_PATH = /usr/local/gromacs/lib
CUDA toolkit 11.7, artifact installation
NVIDIA driver 470.141.3, for CUDA 11.4
CUDA driver 11.7
Libraries:
- CUBLAS: 11.10.1
- CURAND: 10.2.10
- CUFFT: 10.7.2
- CUSOLVER: 11.3.5
- CUSPARSE: 11.7.3
- CUPTI: 17.0.0
- NVML: 11.0.0+470.141.3
Downloaded artifact: CUDNN
- CUDNN: 8.30.2 (for CUDA 11.5.0)
Downloaded artifact: CUTENSOR
- CUTENSOR: 1.4.0 (for CUDA 11.5.0)
Toolchain:
- Julia: 1.8.2
- LLVM: 13.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86
2 devices:
0: NVIDIA RTX A6000 (sm_86, 47.531 GiB / 47.544 GiB available)
1: NVIDIA RTX A6000 (sm_86, 19.788 GiB / 47.541 GiB available)
^ [21141c5a] AMDGPU v0.4.2
[4c88cf16] Aqua v0.5.5
[a963bdd2] AtomsBase v0.2.2 `~/.julia/dev/AtomsBase`
[6e4b80f9] BenchmarkTools v1.3.1
[99c8bb3a] Bio3DView v0.1.4 `~/.julia/dev/Bio3DView`
[de9282ab] BioStructures v1.2.1 `~/.julia/dev/BioStructures`
[052768ef] CUDA v3.12.0
[69e1c6dd] CellListMap v0.8.4
⌃ [082447d4] ChainRules v1.42.0
[d360d2e6] ChainRulesCore v1.15.6
[46823bd8] Chemfiles v0.10.3
⌃ [31c24e10] Distributions v0.25.75
[7da242da] Enzyme v0.10.11 `~/.julia/dev/Enzyme`
[8f5d6c58] EzXML v1.1.0
⌃ [26cc04aa] FiniteDifferences v0.12.24
[1fa38f19] Format v1.3.2
[f6369f11] ForwardDiff v0.10.32
[e9467ef8] GLMakie v0.6.13
[7073ff75] IJulia v1.23.3
[63c18a36] KernelAbstractions v0.8.4
[259c3a9c] MMTF v1.0.0 `~/.julia/dev/MMTF`
⌅ [ee78f7c6] Makie v0.17.13
[aa0f7f06] Molly v0.13.0 `~/.julia/dev/Molly`
[5fb14364] OhMyREPL v0.5.12
[32113eaa] PkgBenchmark v0.2.12
⌃ [91a5bcdd] Plots v1.34.3
[c46f51b8] ProfileView v1.5.2
[186d2b2d] ProteinEnsembles v0.3.1 `~/.julia/dev/ProteinEnsembles`
[295af30f] Revise v3.4.0
[90137ffa] StaticArrays v1.5.9
⌃ [f3b207a7] StatsPlots v0.15.3
[1986cc42] Unitful v1.12.0
[f31437dd] UnitfulChainRules v0.1.2
[e88e6eb3] Zygote v0.6.44 `~/.julia/dev/Zygote`
[7cc45869] Enzyme_jll v0.0.42+0
Can you retry latest main
On d37ce7247b9cabd910c5aa73ed8bd6f5d73bb7d2 with Enzyme_jll 0.0.43 it still errors, but the error changes:
ERROR: LoadError: LLVM error: Cannot select: 0x9f6b278: f32,ch = AtomicLoad<(load acquire (s32) from %ir."'ipc123_unwrap.i.i", addrspace 1)> 0x76f0260:1, 0x97e5870, /home/jgreener/.julia/packages/LLVM/WjSQG/src/interop/base.jl:40 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:28 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:28 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:99 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:106 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:456 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:444 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:439 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:90 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:0 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:6270 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:6001 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:5978 @[ /home/jgreener/.julia/dev/Enzyme/src/Enzyme.jl:384 @[ /home/jgreener/.julia/dev/Enzyme/src/Enzyme.jl:398 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:107 ] ] ] ] ] ] ] ] ] ] ] ] ] ] ]
0x97e5870: i64 = add 0x9be74b0, 0x9f6a8b8, /home/jgreener/.julia/packages/LLVM/WjSQG/src/interop/base.jl:40 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:28 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:28 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:99 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:106 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:456 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:444 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:439 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:92 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:0 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:6270 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:6001 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:5978 @[ /home/jgreener/.julia/dev/Enzyme/src/Enzyme.jl:384 @[ /home/jgreener/.julia/dev/Enzyme/src/Enzyme.jl:398 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:107 ] ] ] ] ] ] ] ] ] ] ] ] ] ] ]
0x9be74b0: i64,ch = CopyFromReg 0xa0e1498, Register:i64 %12, /home/jgreener/.julia/packages/LLVM/WjSQG/src/interop/base.jl:40 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:28 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:28 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:99 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:456 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:444 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:439 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:95 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:0 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:6270 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:6001 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:5978 @[ /home/jgreener/.julia/dev/Enzyme/src/Enzyme.jl:384 @[ /home/jgreener/.julia/dev/Enzyme/src/Enzyme.jl:398 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:107 ] ] ] ] ] ] ] ] ] ] ] ] ] ]
0x974c3b0: i64 = Register %12
0x9f6a8b8: i64 = mul 0x72d09c8, 0x9f6a648, /home/jgreener/.julia/packages/LLVM/WjSQG/src/interop/base.jl:40 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:28 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:28 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:99 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:106 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:456 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:444 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:439 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:92 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:0 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:6270 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:6001 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:5978 @[ /home/jgreener/.julia/dev/Enzyme/src/Enzyme.jl:384 @[ /home/jgreener/.julia/dev/Enzyme/src/Enzyme.jl:398 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:107 ] ] ] ] ] ] ] ] ] ] ] ] ] ] ]
0x72d09c8: i64 = add nsw 0x72d1118, Constant:i64<-1>, /home/jgreener/.julia/packages/LLVM/WjSQG/src/interop/base.jl:40 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:28 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:28 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:99 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:106 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:456 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:444 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:439 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:92 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:0 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:6270 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:6001 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:5978 @[ /home/jgreener/.julia/dev/Enzyme/src/Enzyme.jl:384 @[ /home/jgreener/.julia/dev/Enzyme/src/Enzyme.jl:398 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:107 ] ] ] ] ] ] ] ] ] ] ] ] ] ] ]
0x72d1118: i64,ch = load<(load (s32) from %ir.375, !tbaa !305), sext from i32> 0x9f6b140:1, 0x9f6af38, undef:i64, /home/jgreener/.julia/packages/LLVM/WjSQG/src/interop/base.jl:40 @[ /home/jgreener/.julia/packages/LLVM/WjSQG/src/interop/pointer.jl:9 @[ /home/jgreener/.julia/packages/LLVM/WjSQG/src/interop/pointer.jl:9 @[ /home/jgreener/.julia/packages/LLVM/WjSQG/src/interop/pointer.jl:81 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/array.jl:119 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/array.jl:111 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/array.jl:192 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:78 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:0 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:6270 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:6001 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:5978 @[ /home/jgreener/.julia/dev/Enzyme/src/Enzyme.jl:384 @[ /home/jgreener/.julia/dev/Enzyme/src/Enzyme.jl:398 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:107 ] ] ] ] ] ] ] ] ] ] ] ] ] ]
0x9f6af38: i64 = add 0x9be7928, 0x97e5530, /home/jgreener/.julia/packages/LLVM/WjSQG/src/interop/base.jl:40 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:28 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:28 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:99 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:106 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:456 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:444 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:439 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:92 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:0 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:6270 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:6001 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:5978 @[ /home/jgreener/.julia/dev/Enzyme/src/Enzyme.jl:384 @[ /home/jgreener/.julia/dev/Enzyme/src/Enzyme.jl:398 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:107 ] ] ] ] ] ] ] ] ] ] ] ] ] ] ]
0x9be7928: i64,ch = CopyFromReg 0xa0e1498, Register:i64 %156, /home/jgreener/.julia/packages/LLVM/WjSQG/src/interop/base.jl:40 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:28 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:28 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:99 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:106 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:456 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:444 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:439 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:92 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:0 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:6270 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:6001 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:5978 @[ /home/jgreener/.julia/dev/Enzyme/src/Enzyme.jl:384 @[ /home/jgreener/.julia/dev/Enzyme/src/Enzyme.jl:398 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:107 ] ] ] ] ] ] ] ] ] ] ] ] ] ] ]
0x974bf38: i64 = Register %156
0x97e5530: i64 = shl 0x974c1a8, Constant:i32<2>, /home/jgreener/.julia/packages/LLVM/WjSQG/src/interop/base.jl:40 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:28 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:28 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:99 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:456 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:444 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:439 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:95 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:0 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:6270 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:6001 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:5978 @[ /home/jgreener/.julia/dev/Enzyme/src/Enzyme.jl:384 @[ /home/jgreener/.julia/dev/Enzyme/src/Enzyme.jl:398 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:107 ] ] ] ] ] ] ] ] ] ] ] ] ] ]
0x974c1a8: i64,ch = CopyFromReg 0xa0e1498, Register:i64 %171, /home/jgreener/.julia/packages/LLVM/WjSQG/src/interop/base.jl:40 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:28 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:28 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:99 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:456 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:444 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:439 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:95 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:0 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:6270 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:6001 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:5978 @[ /home/jgreener/.julia/dev/Enzyme/src/Enzyme.jl:384 @[ /home/jgreener/.julia/dev/Enzyme/src/Enzyme.jl:398 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:107 ] ] ] ] ] ] ] ] ] ] ] ] ] ]
0x93fab10: i64 = Register %171
0x9be7ed8: i32 = Constant<2>
0x9be77f0: i64 = undef
0x76ef8a0: i64 = Constant<-1>
0x9f6a648: i64 = shl 0x72d11e8, Constant:i32<2>, /home/jgreener/.julia/packages/LLVM/WjSQG/src/interop/base.jl:40 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:28 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:28 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:99 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:456 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:444 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:439 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:95 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:0 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:6270 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:6001 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:5978 @[ /home/jgreener/.julia/dev/Enzyme/src/Enzyme.jl:384 @[ /home/jgreener/.julia/dev/Enzyme/src/Enzyme.jl:398 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:107 ] ] ] ] ] ] ] ] ] ] ] ] ] ]
0x72d11e8: i64 = smax 0x9be7990, Constant:i64<0>, /home/jgreener/.julia/packages/LLVM/WjSQG/src/interop/base.jl:40 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:28 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:28 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:99 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:456 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:444 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:439 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:95 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:0 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:6270 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:6001 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:5978 @[ /home/jgreener/.julia/dev/Enzyme/src/Enzyme.jl:384 @[ /home/jgreener/.julia/dev/Enzyme/src/Enzyme.jl:398 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:107 ] ] ] ] ] ] ] ] ] ] ] ] ] ]
0x9be7990: i64,ch = CopyFromReg 0xa0e1498, Register:i64 %11, /home/jgreener/.julia/packages/LLVM/WjSQG/src/interop/base.jl:40 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:28 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:28 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:99 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:456 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:444 @[ /home/jgreener/.julia/packages/CUDA/DfvRa/src/device/intrinsics/atomics.jl:439 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:95 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:0 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:6270 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:6001 @[ /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:5978 @[ /home/jgreener/.julia/dev/Enzyme/src/Enzyme.jl:384 @[ /home/jgreener/.julia/dev/Enzyme/src/Enzyme.jl:398 @[ /home/jgreener/dms/molly_dev/enzyme_err2d.jl:107 ] ] ] ] ] ] ] ] ] ] ] ] ] ]
0x97e53f8: i64 = Register %11
0x97e51f0: i64 = Constant<0>
0x9be7ed8: i32 = Constant<2>
In function: _Z23julia_grad_kernel__418213CuDeviceArrayI7Float32Li2ELi1EES_IS0_Li2ELi1EES_I6SArrayI5TupleILi3EES0_Li1ELi3EELi1ELi1EES_IS1_IS2_ILi3EES0_Li1ELi3EELi1ELi1EES_I4AtomLi1ELi1EES_IS3_Li1ELi1EES_IS2_I5Int64S4_ELi1ELi1EE3ValILi512EE
Stacktrace:
[1] handle_error(reason::Cstring)
@ LLVM ~/.julia/packages/LLVM/WjSQG/src/core/context.jl:105
[2] LLVMTargetMachineEmitToMemoryBuffer
@ ~/.julia/packages/LLVM/WjSQG/lib/13/libLLVM_h.jl:947 [inlined]
[3] emit(tm::LLVM.TargetMachine, mod::LLVM.Module, filetype::LLVM.API.LLVMCodeGenFileType)
@ LLVM ~/.julia/packages/LLVM/WjSQG/src/targetmachine.jl:45
[4] mcgen(job::GPUCompiler.CompilerJob, mod::LLVM.Module, format::LLVM.API.LLVMCodeGenFileType)
@ GPUCompiler ~/.julia/packages/GPUCompiler/07qaN/src/mcgen.jl:73
[5] macro expansion
@ ~/.julia/packages/TimerOutputs/4yHI4/src/TimerOutput.jl:253 [inlined]
[6] macro expansion
@ ~/.julia/packages/GPUCompiler/07qaN/src/driver.jl:430 [inlined]
[7] macro expansion
@ ~/.julia/packages/TimerOutputs/4yHI4/src/TimerOutput.jl:253 [inlined]
[8] macro expansion
@ ~/.julia/packages/GPUCompiler/07qaN/src/driver.jl:427 [inlined]
[9] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
@ GPUCompiler ~/.julia/packages/GPUCompiler/07qaN/src/utils.jl:68
[10] cufunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
@ CUDA ~/.julia/packages/CUDA/DfvRa/src/compiler/execution.jl:354
[11] #224
@ ~/.julia/packages/CUDA/DfvRa/src/compiler/execution.jl:347 [inlined]
[12] JuliaContext(f::CUDA.var"#224#225"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{typeof(grad_kernel!), Tuple{CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}, CuDeviceVector{SVector{3, Float32}, 1}, CuDeviceVector{SVector{3, Float32}, 1}, CuDeviceVector{Atom, 1}, CuDeviceVector{Atom, 1}, CuDeviceVector{Tuple{Int64, Int64}, 1}, Val{512}}}}})
@ GPUCompiler ~/.julia/packages/GPUCompiler/07qaN/src/driver.jl:76
[13] cufunction_compile(job::GPUCompiler.CompilerJob)
@ CUDA ~/.julia/packages/CUDA/DfvRa/src/compiler/execution.jl:346
[14] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
@ GPUCompiler ~/.julia/packages/GPUCompiler/07qaN/src/cache.jl:90
[15] cufunction(f::typeof(grad_kernel!), tt::Type{Tuple{CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}, CuDeviceVector{SVector{3, Float32}, 1}, CuDeviceVector{SVector{3, Float32}, 1}, CuDeviceVector{Atom, 1}, CuDeviceVector{Atom, 1}, CuDeviceVector{Tuple{Int64, Int64}, 1}, Val{512}}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ CUDA ~/.julia/packages/CUDA/DfvRa/src/compiler/execution.jl:299
[16] cufunction(f::typeof(grad_kernel!), tt::Type{Tuple{CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}, CuDeviceVector{SVector{3, Float32}, 1}, CuDeviceVector{SVector{3, Float32}, 1}, CuDeviceVector{Atom, 1}, CuDeviceVector{Atom, 1}, CuDeviceVector{Tuple{Int64, Int64}, 1}, Val{512}}})
@ CUDA ~/.julia/packages/CUDA/DfvRa/src/compiler/execution.jl:292
[17] macro expansion
@ ~/.julia/packages/CUDA/DfvRa/src/compiler/execution.jl:102 [inlined]
[18] top-level scope
@ ~/.julia/packages/CUDA/DfvRa/src/utilities.jl:25
in expression starting at /home/jgreener/dms/molly_dev/enzyme_err2d.jl:127
The printall error is attached.
Oh that's much more exciting!
I wonder if the comment in https://reviews.llvm.org/D50391 is still true.
Higher levels of atomicity (like acquire and release) need additional synchronization properties which were added with PTX ISA 6.0 / sm_70. So using these instructions still results in an error.
We are trying to emit an atomic load aquire.
Yeah, looks similar to what I had in https://github.com/JuliaConcurrent/Atomix.jl/issues/33
It's weird that LLVM is trying to select AtomicLoad though. I only see RMW in the Julia code. Maybe Enzyme inserts some loads given some RMW in the user code? If so, I wonder if you can use Atomix.@atomic :monotonic forces[1, i] -= dx
etc. to avoid it (provided that Enzyme copies the ordering).
(Note: you'd need Atomix for now since CUDA.jl uses acq_rel https://github.com/JuliaGPU/CUDA.jl/blob/0cd30cbed3d084cede39db1a9959630ddae904a1/src/device/intrinsics/atomics.jl#L43-L46)
Somewhat relevant https://github.com/JuliaGPU/CUDA.jl/pull/1393
Yeah the derivative of an atomicadd can create an atomic load. Presently we preserve the same ordering -- hence the above
Not sure how much of workarounds you'd want to add in Enzyme, but maybe you can use fetch-and-add with 0 for load (and swap for store) when the ordering is stronger than monotonic?
I wonder if the comment in https://reviews.llvm.org/D50391 is still true.
Yeah, I still see the comment in the main
branch https://github.com/llvm/llvm-project/blob/de6dfbbb300e552efa1cd86a023063a39d408b06/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp#L854-L859
I guess LLVM needs to do whatever NVCC does with libcu++ https://godbolt.org/z/aoM6477T4
Interesting to see the differences to sm_60 https://godbolt.org/z/Y7Pj5G7sK
If this is the issue, is there a way to update other software to get around it? My device is sm_86 and I am on CUDA 11.7.
Could you try Takafumi's suggestion in https://github.com/EnzymeAD/Enzyme.jl/issues/511#issuecomment-1279187387?
Replacing the forces[1, i] -= dx
lines with Atomix.@atomic :monotonic forces[1, i] -= dx
means it runs without throwing an error.
However d_cu_coords
and d_cu_atoms
remain zero, i.e. it doesn't seem like the gradients are recorded.
However d_cu_coords and d_cu_atoms remain zero, i.e. it doesn't seem like the gradients are recorded.
Could you open a new issue with that and. a complete reproducer as minimal as you can get it :)
Looking into it but running into some segfaults that have appeared with recent commits: https://github.com/EnzymeAD/Enzyme.jl/issues/533.
I am looking into a minimal example with Atomix but running into some non-Enzyme issues on the GPU so reported them at https://github.com/JuliaConcurrent/Atomix.jl/issues/33.
@jgreener64 Is there some equivalent C / CUDA code (in GROMACS, for example) we could look at to see if we can reproduce this issue there? We are trying to see if this is a Julia issue or an Enzyme issue.
The kernels in the fastest software are more complicated, using warp reductions and clever ordering of pairs to get high speed. See for example the CUDA kernel in OpenMM, which uses some atomics: https://github.com/openmm/openmm/blob/master/platforms/cuda/src/kernels/nonbonded.cu. There are likely some simpler implementations around but I don't know of any off the top of my head.
I think this issue may be solved though based on @vchuravy's comment in https://github.com/EnzymeAD/Enzyme.jl/issues/576? In particular when I run that code (which differs from the top code here by using UnsafeAtomicsLLVM and +=
for all forces) on Enzyme 0.10.15 with https://github.com/JuliaGPU/CUDA.jl/pull/1644 and the -g0
Julia flag, it seems to work. By work I mean that d_cu_coords
is not zero like it was before, I can test for correctness later.
Great! Thanks for the reference, and yes. @vchuravy and I were in a meeting discussing this and we think things "work" (tm) now, but also did not check for correctness.
Brilliant, thanks for all the help on this. If it's helpful I can make a PR adding this as a regression test to Enzyme once https://github.com/JuliaGPU/CUDA.jl/pull/1644 is in a release and https://github.com/EnzymeAD/Enzyme.jl/issues/576 is fixed.
I am on Julia 1.8.1 and Enzyme e452f8932fc602989df23d96e5039a3268e5e965. The following works:
However it errors when I make the
forces[1, i] -= dx
lines useCUDA.@atomic
(as commented out). The truncated error message is: