Closed jw2249a closed 10 months ago
Navi 3 is supported only on Julia 1.10+, but I'm not sure that will fix your error...
Navi 3 is supported only on Julia 1.10+, but I'm not sure that will fix your error...
@pxl-th I'm recompiling it now. I think it may be a permissions error or directory searching issue because when I ran julia as a superuser with sudo I get the error in the test that says Navi 3 is supported by Julia 1.10 and it doesn't immediately crash.
Also, Navi 3 may hang during tests, I'm not sure why. That only happens on Linux and may be a Linux kernel issue
However, outside of AMDGPU.jl tests it works fine
@pxl-th upgrading to 1.10.0-rc2 got it working but failed tests. will close this because it works now.
I have the same problem with
julia> versioninfo()
Julia Version 1.10.5
Commit 6f3fdf7b362 (2024-08-27 14:19 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 16 × AMD Ryzen 7 7840U w/ Radeon 780M Graphics
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 8 default, 0 interactive, 4 GC (on 16 virtual cores)
Environment:
JULIA_NUM_THREADS = 8
julia> ENV["HSA_OVERRIDE_GFX_VERSION"]
"11.0.0"
Full error:
julia> using AMDGPU
julia: /workspace/srcdir/ROCR-Runtime/src/core/runtime/amd_gpu_agent.cpp:339: void rocr::AMD::GpuAgent::AssembleShader(const char *, rocr::AMD::GpuAgent::AssembleTarget, void *&, size_t &) const: Assertion `code_buf != __null && "Code buffer allocation failed"' failed.
[171545] signal (6.-6): Aborted
in expression starting at REPL[3]:1
pthread_kill at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
raise at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x78a94d62871a)
__assert_fail at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_ZNK4rocr3AMD8GpuAgent14AssembleShaderEPKcNS1_14AssembleTargetERPvRm at /home/kalmar/.julia/artifacts/4df816456579cea2c03cac08a6b82fa87abe2b38/lib/libhsa-runtime64.so (unknown line)
_ZN4rocr3AMD8GpuAgent15BindTrapHandlerEv at /home/kalmar/.julia/artifacts/4df816456579cea2c03cac08a6b82fa87abe2b38/lib/libhsa-runtime64.so (unknown line)
_ZN4rocr3AMD8GpuAgent13PostToolsInitEv at /home/kalmar/.julia/artifacts/4df816456579cea2c03cac08a6b82fa87abe2b38/lib/libhsa-runtime64.so (unknown line)
_ZN4rocr4core7Runtime4LoadEv at /home/kalmar/.julia/artifacts/4df816456579cea2c03cac08a6b82fa87abe2b38/lib/libhsa-runtime64.so (unknown line)
_ZN4rocr4core7Runtime7AcquireEv at /home/kalmar/.julia/artifacts/4df816456579cea2c03cac08a6b82fa87abe2b38/lib/libhsa-runtime64.so (unknown line)
_ZN4rocr3HSA8hsa_initEv at /home/kalmar/.julia/artifacts/4df816456579cea2c03cac08a6b82fa87abe2b38/lib/libhsa-runtime64.so (unknown line)
hsa_init at /home/kalmar/.julia/artifacts/4df816456579cea2c03cac08a6b82fa87abe2b38/lib/libhsa-runtime64.so (unknown line)
unknown function (ip: 0x78a8bc86b4de)
unknown function (ip: 0x78a8bc7caebb)
unknown function (ip: 0x78a8bc862c95)
unknown function (ip: 0x78a8bc52fca9)
hipRuntimeGetVersion at /home/kalmar/.julia/artifacts/3e4a5c18581a48180ab1525d3d490a2e2552616f/hip/lib/libamdhip64.so (unknown line)
_hip_runtime_version at /home/kalmar/.julia/packages/AMDGPU/a1v0k/src/discovery/discovery.jl:87
__init__ at /home/kalmar/.julia/packages/AMDGPU/a1v0k/src/discovery/discovery.jl:144
jfptr___init___5505 at /home/kalmar/.julia/compiled/v1.10/AMDGPU/arpZD_llESY.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
jl_module_run_initializer at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/toplevel.c:76
run_module_init at ./loading.jl:1134
register_restored_modules at ./loading.jl:1122
_include_from_serialized at ./loading.jl:1067
_require_search_from_serialized at ./loading.jl:1581
_require at ./loading.jl:1938
__require_prelocked at ./loading.jl:1812
jfptr___require_prelocked_80833.1 at /home/kalmar/.julia/juliaup/julia-1.10.5+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
jl_f__call_in_world at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/builtins.c:831
#invoke_in_world#3 at ./essentials.jl:926 [inlined]
invoke_in_world at ./essentials.jl:923 [inlined]
_require_prelocked at ./loading.jl:1803
macro expansion at ./loading.jl:1790 [inlined]
macro expansion at ./lock.jl:267 [inlined]
__require at ./loading.jl:1753
jfptr___require_80798.1 at /home/kalmar/.julia/juliaup/julia-1.10.5+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
jl_f__call_in_world at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/builtins.c:831
#invoke_in_world#3 at ./essentials.jl:926 [inlined]
invoke_in_world at ./essentials.jl:923 [inlined]
require at ./loading.jl:1746
jfptr_require_80795.1 at /home/kalmar/.julia/juliaup/julia-1.10.5+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
call_require at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/toplevel.c:481 [inlined]
eval_import_path at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/toplevel.c:518
jl_toplevel_eval_flex at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/toplevel.c:752
jl_toplevel_eval_flex at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/toplevel.c:877
ijl_toplevel_eval_in at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/toplevel.c:985
eval at ./boot.jl:385 [inlined]
eval_user_input at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:150
repl_backend_loop at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:246
#start_repl_backend#46 at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:231
start_repl_backend at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:228
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
#run_repl#59 at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:389
run_repl at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:375
jfptr_run_repl_91805.1 at /home/kalmar/.julia/juliaup/julia-1.10.5+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
#1013 at ./client.jl:432
jfptr_YY.1013_82772.1 at /home/kalmar/.julia/juliaup/julia-1.10.5+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
jl_f__call_latest at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/builtins.c:812
#invokelatest#2 at ./essentials.jl:892 [inlined]
invokelatest at ./essentials.jl:889 [inlined]
run_main_repl at ./client.jl:416
exec_options at ./client.jl:333
_start at ./client.jl:552
jfptr__start_82798.1 at /home/kalmar/.julia/juliaup/julia-1.10.5+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
true_main at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/jlapi.c:582
jl_repl_entrypoint at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/jlapi.c:731
main at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/cli/loader_exe.c:58
unknown function (ip: 0x78a94d629d8f)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 26510462 (Pool: 26475799; Big: 34663); GC: 38
[1] 171545 IOT instruction (core dumped) julia
rocminfo:
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.14
Runtime Ext Version: 1.6
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
DMAbuf Support: YES
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD Ryzen 7 7840U w/ Radeon 780M Graphics
Uuid: CPU-XX
Marketing Name: AMD Ryzen 7 7840U w/ Radeon 780M Graphics
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 5289
BDFID: 0
Internal Node ID: 0
Compute Unit: 16
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Memory Properties:
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 28505104(0x1b2f410) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 28505104(0x1b2f410) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 28505104(0x1b2f410) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 2
*******
Name: gfx1100
Uuid: GPU-XX
Marketing Name: AMD Radeon Graphics
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 32(0x20) KB
L2: 2048(0x800) KB
Chip ID: 5567(0x15bf)
ASIC Revision: 9(0x9)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2700
BDFID: 49920
Internal Node ID: 1
Compute Unit: 12
SIMDs per CU: 2
Shader Engines: 1
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Memory Properties: APU
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 39
SDMA engine uCode:: 18
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 4194304(0x400000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 4194304(0x400000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1100
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
What OS are you on?
Also you don't need to specify ENV["HSA_OVERRIDE_GFX_VERSION"]
it's ubuntu 22.04 LTS with HWE
$ uname -a
Linux hp-845-g10 6.8.0-40-generic #40~22.04.3-Ubuntu SMP PREEMPT_DYNAMIC Tue Jul 30 17:30:19 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
official rocm-6.2 installation following the instructions from amd docs.
Hm... I'm on ROCm 6.1.2 (as well as our CI machines), let me try 6.2, maybe something has changed.
I just installed ROCm 6.2 on Ubuntu 22.04 and it works without issues. I used AMDGPU install script: https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/amdgpu-install.html
Do you have AMDGPU artifacts enabled? I see references to them in your stacktrace. If so, disable them (with code below or removing LocalPreferences.toml
file) and try using your system-wide ROCm installation:
julia> AMDGPU.ROCmDiscovery.use_artifacts!(false)
We probably should remove them for now, to not confuse users, since they are quite old.
@pxl-th Which options did you use for amdgpu-install? I used this:
$ amdgpu-install --usecase=graphics,opencl,hip,rocm --opencl=rocr --no-32
I can't even disable the artifacts, as simple using AMDGPU
segfaults the whole julia session.
I can't even disable the artifacts, as simple
using AMDGPU
segfaults the whole julia session.
You could add a LocalPreferences.toml
file in your working dir or project that includes the artifact info:
$ cat LocalPreferences.toml
[AMDGPU]
use_artifacts = false
and then try using AMDGPU
with this.
that worked! thanks @luraess
@pxl-th Which options did you use for amdgpu-install? I used this:
$ amdgpu-install --usecase=graphics,opencl,hip,rocm --opencl=rocr --no-32
I can't even disable the artifacts, as simple
using AMDGPU
segfaults the whole julia session.
Same, except for these flags: --opencl=rocr --no-32
Wonder why it did default to artifacts, since by default they are disabled.
But we should just remove them for now.
Is #673 helping here @pxl-th ?
Wonder why it did default to artifacts, since by default they are disabled.
It is possible that I had placed the file there as I experimented with AMDGPU on my old laptop. However this seems rather unlikely as that was at least 2 years ago (and julia-1.10 was not there yet?).
OS: Ubuntu 22.04.3 GPU: 7900 XTX ROCM Version: 5.7.1 (installed with amdgpu-installer). Julia Version: Julia v1.9.4
Both the test and the import fail with
julia: /workspace/srcdir/ROCR-Runtime/src/core/runtime/amd_gpu_agent.cpp:339: void rocr::AMD::GpuAgent::AssembleShader(const char *, rocr::AMD::GpuAgent::AssembleTarget, void *&, size_t &) const: Assertion `code_buf != __null && "Code buffer allocation failed"' failed.
clinfo shows
This is what the crash looks like