Open joelandman opened 5 months ago
Worth noting that this works on an MI50, and an integrated GPU on 7950x.
MI50
julia> using AMDGPU
julia> AMDGPU.devices()
┌────┬────────────────────┬────────────────────────┬───────────┬────────────┐
│ Id │ Name │ GCN arch │ Wavefront │ Memory │
├────┼────────────────────┼────────────────────────┼───────────┼────────────┤
│ 1 │ AMD Radeon VII │ gfx906:sramecc+:xnack- │ 64 │ 15.984 GiB │
│ 2 │ AMD Radeon RX 6600 │ gfx1032 │ 32 │ 7.984 GiB │
└────┴────────────────────┴────────────────────────┴───────────┴────────────┘
julia> # CPU version
a_h = rand(Float16,5,5)
5×5 Matrix{Float16}:
0.1758 0.2559 0.8525 0.0625 0.987
0.0957 0.4429 0.949 0.593 0.4824
0.46 0.945 0.9917 0.738 0.010254
0.779 0.7344 0.9824 0.544 0.0332
0.503 0.977 0.31 0.3086 0.523
julia> z_h = a_h .- Float16(0.5)
# GPU version 1
5×5 Matrix{Float16}:
-0.3242 -0.2441 0.3525 -0.4375 0.4868
-0.4043 -0.05713 0.4492 0.0928 -0.01758
-0.04004 0.4448 0.4917 0.2378 -0.4897
0.2788 0.2344 0.4824 0.04395 -0.4668
0.00293 0.477 -0.19 -0.1914 0.02295
julia> a_d = ROCMatrix(rand(Float16,5,5))
5×5 ROCArray{Float16, 2, AMDGPU.Runtime.Mem.HIPBuffer}:
0.3027 0.502 0.3276 0.0796 0.456
0.1606 0.4282 0.1875 0.816 0.2573
0.5347 0.8003 0.5215 0.103 0.0908
0.7695 0.8228 0.802 0.8037 0.187
0.475 0.1553 0.608 0.8735 0.25
julia> z_d = a_d .- Float16(0.5)
5×5 ROCArray{Float16, 2, AMDGPU.Runtime.Mem.HIPBuffer}:
-0.1973 0.001953 -0.1724 -0.4204 -0.04395
-0.3394 -0.0718 -0.3125 0.316 -0.2427
0.03467 0.3003 0.02148 -0.397 -0.4092
0.2695 0.3228 0.3018 0.3037 -0.313
-0.0249 -0.3447 0.1079 0.3735 -0.25
julia> # GPU version 2
b_d = AMDGPU.rand(Float16,5,5)
5×5 ROCArray{Float16, 2, AMDGPU.Runtime.Mem.HIPBuffer}:
0.674 0.4595 0.624 0.0912 0.821
0.02998 0.4895 0.02676 0.385 0.4805
0.522 0.978 0.4788 0.684 0.8164
0.1853 0.9688 0.39 0.3337 0.5186
0.00983 0.3857 0.4546 0.846 0.3872
julia> y_d = b_d .- Float16(0.5)
5×5 ROCArray{Float16, 2, AMDGPU.Runtime.Mem.HIPBuffer}:
0.1738 -0.04053 0.124 -0.4087 0.3208
-0.47 -0.0105 -0.4731 -0.115 -0.01953
0.02197 0.478 -0.02124 0.1841 0.3164
-0.3147 0.4688 -0.1101 -0.1663 0.01855
-0.4902 -0.11426 -0.0454 0.3462 -0.1128
julia> versioninfo()
Julia Version 1.10.2
Commit bd47eca2c8* (2024-03-01 10:14 UTC)
Build Info:
Note: This is an unofficial build, please report bugs to the project
responsible for this build and not to the Julia project unless you can
reproduce the issue using official builds available at https://julialang.org/downloads
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 32 × AMD Ryzen Threadripper 1950X 16-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, znver1)
Threads: 8 default, 0 interactive, 4 GC (on 32 virtual cores)
Environment:
LD_LIBRARY_PATH = :/opt/rocm-6.1.0-13294/lib:/nvme/home/joe/local/lib
7950X
julia> using AMDGPU
julia> AMDGPU.devices()
┌────┬─────────────────────┬──────────┬───────────┬───────────┐
│ Id │ Name │ GCN arch │ Wavefront │ Memory │
├────┼─────────────────────┼──────────┼───────────┼───────────┤
│ 1 │ AMD Radeon Graphics │ gfx1030 │ 32 │ 8.000 GiB │
└────┴─────────────────────┴──────────┴───────────┴───────────┘
julia> a_h = rand(Float16,5,5)
5×5 Matrix{Float16}:
0.2427 0.2471 0.9004 0.56 0.273
0.5806 0.3276 0.943 0.5425 0.4692
0.267 0.1074 0.5127 0.543 0.418
0.708 0.8306 0.273 0.2222 0.929
0.9204 0.5894 0.561 0.09766 0.1562
julia> z_h = a_h .- Float16(0.5)
5×5 Matrix{Float16}:
-0.2573 -0.253 0.4004 0.06006 -0.227
0.08057 -0.1724 0.4429 0.04248 -0.03076
-0.2329 -0.3926 0.012695 0.04297 -0.08203
0.208 0.3306 -0.227 -0.2778 0.4292
0.4204 0.08936 0.06104 -0.4023 -0.3438
julia> a_d = ROCMatrix(rand(Float16,5,5))
5×5 ROCArray{Float16, 2, AMDGPU.Runtime.Mem.HIPBuffer}:
0.6113 0.4038 0.931 0.2935 0.8135
0.02002 0.994 0.3389 0.249 0.508
0.1992 0.5254 0.963 0.4 0.749
0.844 0.709 0.1333 0.3687 0.9595
0.1138 0.4258 0.2104 0.735 0.294
julia> z_d = a_d .- Float16(0.5)
5×5 ROCArray{Float16, 2, AMDGPU.Runtime.Mem.HIPBuffer}:
0.1113 -0.0962 0.4312 -0.2065 0.3135
-0.48 0.4941 -0.1611 -0.251 0.007812
-0.3008 0.02539 0.463 -0.1001 0.249
0.3442 0.209 -0.3667 -0.1313 0.4595
-0.3862 -0.0742 -0.2896 0.2349 -0.206
julia> # GPU version 2
b_d = AMDGPU.rand(Float16,5,5)
5×5 ROCArray{Float16, 2, AMDGPU.Runtime.Mem.HIPBuffer}:
0.7783 0.3125 0.989 0.4648 0.1595
0.7236 0.7017 0.8687 0.3203 0.914
0.962 0.72 0.03864 0.386 0.156
0.1991 0.754 0.69 0.517 0.9272
0.5283 0.822 0.859 0.2283 0.7993
julia> y_d = b_d .- Float16(0.5)
5×5 ROCArray{Float16, 2, AMDGPU.Runtime.Mem.HIPBuffer}:
0.2783 -0.1875 0.4888 -0.03516 -0.3403
0.2236 0.2017 0.3687 -0.1797 0.414
0.462 0.2202 -0.4614 -0.114 -0.344
-0.3008 0.254 0.19 0.01709 0.4272
0.02832 0.3218 0.359 -0.2717 0.2993
julia> versioninfo()
Julia Version 1.10.2
Commit bd47eca2c8* (2024-03-01 10:14 UTC)
Build Info:
Note: This is an unofficial build, please report bugs to the project
responsible for this build and not to the Julia project unless you can
reproduce the issue using official builds available at https://julialang.org/downloads
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 32 × AMD Ryzen 9 7950X 16-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 8 default, 0 interactive, 4 GC (on 32 virtual cores)
Environment:
LD_LIBRARY_PATH = :/usr/local/cuda-12.3/lib64:/nvme/home/joe/local/lib
JULIA_HOME = /nvme/home/joe/local
Do we miss something to support gfx942 @pxl-th ?
Note: gfx942 is new and not widely available, so I didn't expect everything to work. I'm happy to work on this with you though.
Probably because of Julia's 1.10 LLVM version, which is 15, but gfx942 officially was added in LLVM 17 IIUC: https://github.com/llvm/llvm-project/commit/9d0572797233857397f3fdc35fffcfb490354f56
You can try Julia 1.11 early release (which has LLVM 16), but I haven't tested it at all with AMD GPUs yet. In the worst case, we'd have to wait for LLVM 17 to arrive in Julia, which is this PR: https://github.com/JuliaLang/julia/pull/53070
Julia 1.11:
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.11.0-beta1 (2024-04-10)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
julia> using AMDGPU
Precompiling AMDGPU...
Info Given AMDGPU was explicitly requested, output will be shown live
ERROR: LoadError: UndefVarError: `CodeCache` not defined in `GPUCompiler`
Stacktrace:
[1] getproperty(x::Module, f::Symbol)
@ Base ./Base.jl:42
[2] top-level scope
@ ~/.julia/packages/AMDGPU/gtxsf/src/AMDGPU.jl:75
[3] include
@ ./Base.jl:558 [inlined]
[4] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt128}}, source::Nothing)
@ Base ./loading.jl:2721
[5] top-level scope
@ stdin:4
in expression starting at ~/.julia/packages/AMDGPU/gtxsf/src/AMDGPU.jl:1
in expression starting at stdin:4
✗ AMDGPU
0 dependencies successfully precompiled in 5 seconds. 108 already precompiled.
ERROR: The following 1 direct dependency failed to precompile:
AMDGPU
Failed to precompile AMDGPU [21141c5a-9bdb-4563-92ae-f87d6854732e] to "~/.julia/compiled/v1.11/AMDGPU/jl_hqPvGn".
ERROR: LoadError: UndefVarError: `CodeCache` not defined in `GPUCompiler`
Stacktrace:
[1] getproperty(x::Module, f::Symbol)
@ Base ./Base.jl:42
[2] top-level scope
@ ~/.julia/packages/AMDGPU/gtxsf/src/AMDGPU.jl:75
[3] include
@ ./Base.jl:558 [inlined]
[4] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt128}}, source::Nothing)
@ Base ./loading.jl:2721
[5] top-level scope
@ stdin:4
in expression starting at ~/.julia/packages/AMDGPU/gtxsf/src/AMDGPU.jl:1
in expression starting at stdin:
AMDGPU 0.9 now supports Julia 1.11 and maybe MI300X.
Just make sure to launch Julia with JULIA_LLVM_ARGS="-opaque-pointers"
env variable set to use system-wide ROCm device libraries instead of our patched ones.
Just got a similar issue as the original post with Jullia 1.11.0-beta2, ROCm 6.1.2, and AMDGPU 0.9.5. With and without setting JULIA_LLVM_ARGS="-opaque-pointers"
.
julia> a_d = ROCMatrix(rand(Float16,5,5))
5×5 ROCArray{Float16, 2, AMDGPU.Runtime.Mem.HIPBuffer}:
0.644 0.2002 0.208 0.4048 0.6567
0.774 0.4253 0.667 0.03662 0.1997
0.7725 0.6445 0.95 0.2876 0.715
0.2764 0.4453 0.6836 0.4277 0.1118
0.02197 0.5454 0.3564 0.354 0.8027
julia> z_d = a_d .- Float16(0.5)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
...
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
warning: sramecc 'On' was requested for a processor that does not support it!
ERROR: InvalidIRError: compiling MethodInstance for (::GPUArrays.var"#35#37")(::AMDGPU.ROCKernelContext, ::AMDGPU.Device.ROCDeviceMatrix{…}, ::Base.Broadcast.Broadcasted{…}, ::Int64) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to pointerset(ptr::Core.LLVMPtr{T, A}, x::T, i::I, ::Val{align}) where {T, A, I, align} @ LLVM.Interop none:0)
Stacktrace:
[1] unsafe_store! (repeats 3 times)
@ /workspace/packages/LLVM/6cDbl/src/interop/pointer.jl:88
[2] malloc_hc
@ /workspace/packages/AMDGPU/OUSjX/src/device/runtime.jl:98
[3] malloc
@ /workspace/packages/AMDGPU/OUSjX/src/device/gcn/memory_dynamic.jl:12
[4] malloc
@ /workspace/packages/GPUCompiler/nWT2N/src/runtime.jl:88
[5] macro expansion
@ /workspace/packages/GPUCompiler/nWT2N/src/runtime.jl:183
[6] macro expansion
@ ./none:0
[7] box
@ ./none:0
[8] box_uint64
@ /workspace/packages/GPUCompiler/nWT2N/src/runtime.jl:212
[9] multiple call sites
@ unknown:0
...
I have been testing on Runpod and built a Julia-1.11-rc AMD ROCm template you can use to deploy a MI300X. I am happy to help with any debugging as well.
We then need Julia 1.12, which has LLVM 17 (1.11 has LLVM 16). I haven't tested it yet, as 1.11 itself is still in beta, but I can take a look shortly
I just built Julia from source (also added version 17 to compatible version of LLD_jll and LLVM_jll for AMDGPU), and got the same issue:
# ./julia
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.12.0-DEV.706 (2024-06-11)
_/ |\__'_|_|_|\__'_| | Commit e7893a1fa4 (0 days old master)
|__/ |
julia> versioninfo()
Julia Version 1.12.0-DEV.706
Commit e7893a1fa4 (2024-06-11 09:53 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 192 × AMD EPYC 9474F 48-Core Processor
WORD_SIZE: 64
LLVM: libLLVM-17.0.6 (ORCJIT, znver4)
Threads: 1 default, 0 interactive, 1 GC (on 192 virtual cores)
Environment:
JULIA_DEPOT_PATH = /root/
julia> using AMDGPU
julia> AMDGPU.versioninfo()
[ Info: AMDGPU versioninfo
┌───────────┬──────────────────┬───────────┬─────────────────────────────────────────────────────────────────────────┐
│ Available │ Name │ Version │ Path │
├───────────┼──────────────────┼───────────┼─────────────────────────────────────────────────────────────────────────┤
│ + │ LLD │ - │ /opt/rocm/llvm/bin/ld.lld │
│ + │ Device Libraries │ - │ /root/artifacts/5ad5ecb46e3c334821f54c1feecc6c152b7b6a45/amdgcn/bitcode │
│ + │ HIP │ 6.1.40093 │ /opt/rocm/lib/libamdhip64.so │
│ + │ rocBLAS │ 4.1.2 │ /opt/rocm/lib/librocblas.so.4 │
│ + │ rocSOLVER │ 3.25.0 │ /opt/rocm/lib/librocsolver.so.0 │
│ + │ rocALUTION │ - │ /opt/rocm/lib/librocalution.so.1 │
│ + │ rocSPARSE │ - │ /opt/rocm/lib/librocsparse.so.1 │
│ + │ rocRAND │ 2.10.5 │ /opt/rocm/lib/librocrand.so.1 │
│ + │ rocFFT │ 1.0.27 │ /opt/rocm/lib/librocfft.so.0 │
│ + │ MIOpen │ 3.1.0 │ /opt/rocm/lib/libMIOpen.so.1 │
└───────────┴──────────────────┴───────────┴─────────────────────────────────────────────────────────────────────────┘
[ Info: AMDGPU devices
┌────┬─────────────────────┬────────────────────────┬───────────┬─────────────┐
│ Id │ Name │ GCN arch │ Wavefront │ Memory │
├────┼─────────────────────┼────────────────────────┼───────────┼─────────────┤
│ 1 │ AMD Instinct MI300X │ gfx942:sramecc+:xnack- │ 64 │ 191.984 GiB │
└────┴─────────────────────┴────────────────────────┴───────────┴─────────────┘
julia> a_d = ROCMatrix(rand(Float16,5,5))
5×5 ROCArray{Float16, 2, AMDGPU.Runtime.Mem.HIPBuffer}:
0.5596 0.292 0.8354 0.3677 0.641
0.1567 0.978 0.4614 0.2144 0.717
0.4023 0.8706 0.9004 0.9033 0.2319
0.3042 0.3652 0.48 0.02197 0.1309
0.7817 0.1909 0.4595 0.3193 0.846
julia> z_d = a_d .- Float16(0.5)
ERROR: InvalidIRError: compiling MethodInstance for (::GPUArrays.var"#35#37")(::AMDGPU.ROCKernelContext, ::AMDGPU.Device.ROCDeviceMatrix{…}, ::Base.Broadcast.Broadcasted{…}, ::Int64) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to pointerset(ptr::Core.LLVMPtr{T, A}, x::T, i::I, ::Val{align}) where {T, A, I, align} @ LLVM.Interop none:0)
Stacktrace:
[1] unsafe_store! (repeats 3 times)
@ ~/packages/LLVM/6cDbl/src/interop/pointer.jl:88
...
Notably, the 'gfx942' is not a recognized processor for this target (ignoring processor)
messages are gone now.
AMDGPU.jl needs to account for changes in Julia 1.12, I haven't done that yet
AMDGPU.jl needs to account for changes in Julia 1.12, I haven't done that yet
Can you give an indication of what needs to be done? I can't promise anything, but I may or may not have a chance to look into this (if it doesn't take too long :smiling_face_with_tear:)
Simple reproducer, not sure if this specific use case is supported or not. CPU and GPU versions for comparison. MI300X GPU, Ubuntu 22.04. ROCm 6.1 pre-release.
The
a_h
andz_h
are as expected.The
a_d
andb_d
are properly set, though the subtraction yields this