JuliaGPU / AMDGPU.jl

AMD GPU (ROCm) programming in Julia
Other
282 stars 47 forks source link

AMDGPU fails test and crashes when initialized #570

Closed jw2249a closed 10 months ago

jw2249a commented 10 months ago

OS: Ubuntu 22.04.3 GPU: 7900 XTX ROCM Version: 5.7.1 (installed with amdgpu-installer). Julia Version: Julia v1.9.4

Both the test and the import fail with
julia: /workspace/srcdir/ROCR-Runtime/src/core/runtime/amd_gpu_agent.cpp:339: void rocr::AMD::GpuAgent::AssembleShader(const char *, rocr::AMD::GpuAgent::AssembleTarget, void *&, size_t &) const: Assertion `code_buf != __null && "Code buffer allocation failed"' failed.

clinfo shows

  Platform Profile:              FULL_PROFILE
  Platform Version:              OpenCL 2.1 AMD-APP (3590.0)
  Platform Name:                 AMD Accelerated Parallel Processing
  Platform Vendor:               Advanced Micro Devices, Inc.
  Platform Extensions:               cl_khr_icd cl_amd_event_callback 

  Platform Name:                 AMD Accelerated Parallel Processing
Number of devices:               1
  Device Type:                   CL_DEVICE_TYPE_GPU
  Vendor ID:                     1002h
  Board name:                    Radeon RX 7900 XTX
  Device Topology:               PCI[ B#12, D#0, F#0 ]
  Max compute units:                 48
  Max work items dimensions:             3
    Max work items[0]:               1024
    Max work items[1]:               1024
    Max work items[2]:               1024
  Max work group size:               256
  Preferred vector width char:           4
  Preferred vector width short:          2
  Preferred vector width int:            1
  Preferred vector width long:           1
  Preferred vector width float:          1
  Preferred vector width double:         1
  Native vector width char:          4
  Native vector width short:             2
  Native vector width int:           1
  Native vector width long:          1
  Native vector width float:             1
  Native vector width double:            1
  Max clock frequency:               2371Mhz
  Address bits:                  64
  Max memory allocation:             21890072576
  Image support:                 Yes
  Max number of images read arguments:       128
  Max number of images write arguments:      8
  Max image 2D width:                16384
  Max image 2D height:               16384
  Max image 3D width:                16384
  Max image 3D height:               16384
  Max image 3D depth:                8192
  Max samplers within kernel:            16
  Max size of kernel argument:           1024
  Alignment (bits) of base address:      1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                     Yes
    Quiet NaNs:                  Yes
    Round to nearest even:           Yes
    Round to zero:               Yes
    Round to +ve and infinity:           Yes
    IEEE754-2008 fused multiply-add:         Yes
  Cache type:                    Read/Write
  Cache line size:               64
  Cache size:                    32768
  Global memory size:                25753026560
  Constant buffer size:              21890072576
  Max number of constant args:           8
  Local memory type:                 Scratchpad
  Local memory size:                 65536
  Max pipe arguments:                16
  Max pipe active reservations:          16
  Max pipe packet size:              415236096
  Max global variable size:          21890072576
  Max global variable preferred total size:  25753026560
  Max read/write image args:             64
  Max on device events:              1024
  Queue on device max size:          8388608
  Max on device queues:              1
  Queue on device preferred size:        262144
  SVM capabilities:              
    Coarse grain buffer:             Yes
    Fine grain buffer:               Yes
    Fine grain system:               No
    Atomics:                     No
  Preferred platform atomic alignment:       0
  Preferred global atomic alignment:         0
  Preferred local atomic alignment:      0
  Kernel Preferred work group size multiple:     32
  Error correction support:          0
  Unified memory for Host and Device:        0
  Profiling timer resolution:            1
  Device endianess:              Little
  Available:                     Yes
  Compiler available:                Yes
  Execution capabilities:                
    Execute OpenCL kernels:          Yes
    Execute native function:             No
  Queue on Host properties:              
    Out-of-Order:                No
    Profiling :                  Yes
  Queue on Device properties:                
    Out-of-Order:                Yes
    Profiling :                  Yes
  Platform ID:                   0x7f510bbf0f90
  Name:                      gfx1100
  Vendor:                    Advanced Micro Devices, Inc.
  Device OpenCL C version:           OpenCL C 2.0 
  Driver version:                3590.0 (HSA1.1,LC)
  Profile:                   FULL_PROFILE
  Version:                   OpenCL 2.0 
  Extensions:                    cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program 

This is what the crash looks like

julia: /workspace/srcdir/ROCR-Runtime/src/core/runtime/amd_gpu_agent.cpp:339: void rocr::AMD::GpuAgent::AssembleShader(const char *, rocr::AMD::GpuAgent::AssembleTarget, void *&, size_t &) const: Assertion code_buf != __null && "Code buffer allocation failed failed.

[30800] signal (6.-6): Aborted
in expression starting at REPL[6]:1
pthread_kill at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
raise at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7f396222871a)
__assert_fail at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_ZNK4rocr3AMD8GpuAgent14AssembleShaderEPKcNS1_14AssembleTargetERPvRm at /home/jrw/.julia/artifacts/4df816456579cea2c03cac08a6b82fa87abe2b38/lib/libhsa-runtime64.so (unknown line)
_ZN4rocr3AMD8GpuAgent15BindTrapHandlerEv at /home/jrw/.julia/artifacts/4df816456579cea2c03cac08a6b82fa87abe2b38/lib/libhsa-runtime64.so (unknown line)
_ZN4rocr3AMD8GpuAgent13PostToolsInitEv at /home/jrw/.julia/artifacts/4df816456579cea2c03cac08a6b82fa87abe2b38/lib/libhsa-runtime64.so (unknown line)
_ZN4rocr4core7Runtime4LoadEv at /home/jrw/.julia/artifacts/4df816456579cea2c03cac08a6b82fa87abe2b38/lib/libhsa-runtime64.so (unknown line)
_ZN4rocr4core7Runtime7AcquireEv at /home/jrw/.julia/artifacts/4df816456579cea2c03cac08a6b82fa87abe2b38/lib/libhsa-runtime64.so (unknown line)
_ZN4rocr3HSA8hsa_initEv at /home/jrw/.julia/artifacts/4df816456579cea2c03cac08a6b82fa87abe2b38/lib/libhsa-runtime64.so (unknown line)
hsa_init at /home/jrw/.julia/artifacts/4df816456579cea2c03cac08a6b82fa87abe2b38/lib/libhsa-runtime64.so (unknown line)
hsa_init at /home/jrw/.julia/packages/AMDGPU/bzHD4/src/hsa/LibHSARuntime.jl:71 [inlined]
__init__ at /home/jrw/.julia/packages/AMDGPU/bzHD4/src/AMDGPU.jl:245
jl_sysimg_fvars_base at /home/jrw/.julia/compiled/v1.9/AMDGPU/arpZD_ObjvJ.so (unknown line)
jl_apply at /home/jrw/julia/src/julia.h:1880 [inlined]
jl_module_run_initializer at /home/jrw/julia/src/toplevel.c:75
ijl_init_restored_modules at /home/jrw/julia/src/module.c:982
register_restored_modules at ./loading.jl:1115
_include_from_serialized at ./loading.jl:1061
_require_search_from_serialized at ./loading.jl:1506
_require at ./loading.jl:1783
_require_prelocked at ./loading.jl:1660
macro expansion at ./loading.jl:1648 [inlined]
macro expansion at ./lock.jl:267 [inlined]
require at ./loading.jl:1611
jfptr_require_48600 at /home/jrw/julia/julia-1.9.4/lib/julia/sys.so (unknown line)
jl_apply at /home/jrw/julia/src/julia.h:1880 [inlined]
call_require at /home/jrw/julia/src/toplevel.c:466 [inlined]
eval_import_path at /home/jrw/julia/src/toplevel.c:503
jl_toplevel_eval_flex at /home/jrw/julia/src/toplevel.c:731
jl_toplevel_eval_flex at /home/jrw/julia/src/toplevel.c:856
ijl_toplevel_eval_in at /home/jrw/julia/src/toplevel.c:971
eval at ./boot.jl:370 [inlined]
eval_user_input at /home/jrw/julia/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:153
repl_backend_loop at /home/jrw/julia/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:249
#start_repl_backend#46 at /home/jrw/julia/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:234
start_repl_backend at /home/jrw/julia/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:231
#run_repl#59 at /home/jrw/julia/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:379
run_repl at /home/jrw/julia/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:365
jfptr_run_repl_61323 at /home/jrw/julia/julia-1.9.4/lib/julia/sys.so (unknown line)
#1018 at ./client.jl:421
jfptr_YY.1018_45017 at /home/jrw/julia/julia-1.9.4/lib/julia/sys.so (unknown line)
jl_apply at /home/jrw/julia/src/julia.h:1880 [inlined]
jl_f__call_latest at /home/jrw/julia/src/builtins.c:774
#invokelatest#2 at ./essentials.jl:819 [inlined]
invokelatest at ./essentials.jl:816 [inlined]
run_main_repl at ./client.jl:405
exec_options at ./client.jl:322
_start at ./client.jl:522
jfptr__start_52365 at /home/jrw/julia/julia-1.9.4/lib/julia/sys.so (unknown line)
jl_apply at /home/jrw/julia/src/julia.h:1880 [inlined]
true_main at /home/jrw/julia/src/jlapi.c:573
jl_repl_entrypoint at /home/jrw/julia/src/jlapi.c:717
main at julia (unknown line)
unknown function (ip: 0x7f3962229d8f)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_start at julia (unknown line)
Allocations: 14667144 (Pool: 14651509; Big: 15635); GC: 25
Aborted (core dumped)
pxl-th commented 10 months ago

Navi 3 is supported only on Julia 1.10+, but I'm not sure that will fix your error...

jw2249a commented 10 months ago

Navi 3 is supported only on Julia 1.10+, but I'm not sure that will fix your error...

@pxl-th I'm recompiling it now. I think it may be a permissions error or directory searching issue because when I ran julia as a superuser with sudo I get the error in the test that says Navi 3 is supported by Julia 1.10 and it doesn't immediately crash.

pxl-th commented 10 months ago

Make sure your user is in the same group as /dev/kfd: docs

pxl-th commented 10 months ago

Also, Navi 3 may hang during tests, I'm not sure why. That only happens on Linux and may be a Linux kernel issue

pxl-th commented 10 months ago

However, outside of AMDGPU.jl tests it works fine

jw2249a commented 10 months ago

@pxl-th upgrading to 1.10.0-rc2 got it working but failed tests. will close this because it works now.

kalmarek commented 1 month ago

I have the same problem with

julia> versioninfo()
Julia Version 1.10.5
Commit 6f3fdf7b362 (2024-08-27 14:19 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 16 × AMD Ryzen 7 7840U w/ Radeon  780M Graphics
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 8 default, 0 interactive, 4 GC (on 16 virtual cores)
Environment:
  JULIA_NUM_THREADS = 8

julia> ENV["HSA_OVERRIDE_GFX_VERSION"]
"11.0.0"

Full error:

julia> using AMDGPU
julia: /workspace/srcdir/ROCR-Runtime/src/core/runtime/amd_gpu_agent.cpp:339: void rocr::AMD::GpuAgent::AssembleShader(const char *, rocr::AMD::GpuAgent::AssembleTarget, void *&, size_t &) const: Assertion `code_buf != __null && "Code buffer allocation failed"' failed.

[171545] signal (6.-6): Aborted
in expression starting at REPL[3]:1
pthread_kill at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
raise at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x78a94d62871a)
__assert_fail at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_ZNK4rocr3AMD8GpuAgent14AssembleShaderEPKcNS1_14AssembleTargetERPvRm at /home/kalmar/.julia/artifacts/4df816456579cea2c03cac08a6b82fa87abe2b38/lib/libhsa-runtime64.so (unknown line)
_ZN4rocr3AMD8GpuAgent15BindTrapHandlerEv at /home/kalmar/.julia/artifacts/4df816456579cea2c03cac08a6b82fa87abe2b38/lib/libhsa-runtime64.so (unknown line)
_ZN4rocr3AMD8GpuAgent13PostToolsInitEv at /home/kalmar/.julia/artifacts/4df816456579cea2c03cac08a6b82fa87abe2b38/lib/libhsa-runtime64.so (unknown line)
_ZN4rocr4core7Runtime4LoadEv at /home/kalmar/.julia/artifacts/4df816456579cea2c03cac08a6b82fa87abe2b38/lib/libhsa-runtime64.so (unknown line)
_ZN4rocr4core7Runtime7AcquireEv at /home/kalmar/.julia/artifacts/4df816456579cea2c03cac08a6b82fa87abe2b38/lib/libhsa-runtime64.so (unknown line)
_ZN4rocr3HSA8hsa_initEv at /home/kalmar/.julia/artifacts/4df816456579cea2c03cac08a6b82fa87abe2b38/lib/libhsa-runtime64.so (unknown line)
hsa_init at /home/kalmar/.julia/artifacts/4df816456579cea2c03cac08a6b82fa87abe2b38/lib/libhsa-runtime64.so (unknown line)
unknown function (ip: 0x78a8bc86b4de)
unknown function (ip: 0x78a8bc7caebb)
unknown function (ip: 0x78a8bc862c95)
unknown function (ip: 0x78a8bc52fca9)
hipRuntimeGetVersion at /home/kalmar/.julia/artifacts/3e4a5c18581a48180ab1525d3d490a2e2552616f/hip/lib/libamdhip64.so (unknown line)
_hip_runtime_version at /home/kalmar/.julia/packages/AMDGPU/a1v0k/src/discovery/discovery.jl:87
__init__ at /home/kalmar/.julia/packages/AMDGPU/a1v0k/src/discovery/discovery.jl:144
jfptr___init___5505 at /home/kalmar/.julia/compiled/v1.10/AMDGPU/arpZD_llESY.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
jl_module_run_initializer at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/toplevel.c:76
run_module_init at ./loading.jl:1134
register_restored_modules at ./loading.jl:1122
_include_from_serialized at ./loading.jl:1067
_require_search_from_serialized at ./loading.jl:1581
_require at ./loading.jl:1938
__require_prelocked at ./loading.jl:1812
jfptr___require_prelocked_80833.1 at /home/kalmar/.julia/juliaup/julia-1.10.5+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
jl_f__call_in_world at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/builtins.c:831
#invoke_in_world#3 at ./essentials.jl:926 [inlined]
invoke_in_world at ./essentials.jl:923 [inlined]
_require_prelocked at ./loading.jl:1803
macro expansion at ./loading.jl:1790 [inlined]
macro expansion at ./lock.jl:267 [inlined]
__require at ./loading.jl:1753
jfptr___require_80798.1 at /home/kalmar/.julia/juliaup/julia-1.10.5+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
jl_f__call_in_world at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/builtins.c:831
#invoke_in_world#3 at ./essentials.jl:926 [inlined]
invoke_in_world at ./essentials.jl:923 [inlined]
require at ./loading.jl:1746
jfptr_require_80795.1 at /home/kalmar/.julia/juliaup/julia-1.10.5+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
call_require at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/toplevel.c:481 [inlined]
eval_import_path at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/toplevel.c:518
jl_toplevel_eval_flex at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/toplevel.c:752
jl_toplevel_eval_flex at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/toplevel.c:877
ijl_toplevel_eval_in at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/toplevel.c:985
eval at ./boot.jl:385 [inlined]
eval_user_input at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:150
repl_backend_loop at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:246
#start_repl_backend#46 at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:231
start_repl_backend at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:228
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
#run_repl#59 at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:389
run_repl at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:375
jfptr_run_repl_91805.1 at /home/kalmar/.julia/juliaup/julia-1.10.5+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
#1013 at ./client.jl:432
jfptr_YY.1013_82772.1 at /home/kalmar/.julia/juliaup/julia-1.10.5+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
jl_f__call_latest at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/builtins.c:812
#invokelatest#2 at ./essentials.jl:892 [inlined]
invokelatest at ./essentials.jl:889 [inlined]
run_main_repl at ./client.jl:416
exec_options at ./client.jl:333
_start at ./client.jl:552
jfptr__start_82798.1 at /home/kalmar/.julia/juliaup/julia-1.10.5+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
true_main at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/jlapi.c:582
jl_repl_entrypoint at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/jlapi.c:731
main at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/cli/loader_exe.c:58
unknown function (ip: 0x78a94d629d8f)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 26510462 (Pool: 26475799; Big: 34663); GC: 38
[1]    171545 IOT instruction (core dumped)  julia

rocminfo:

ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.14
Runtime Ext Version:     1.6
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 7 7840U w/ Radeon  780M Graphics
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 7 7840U w/ Radeon  780M Graphics
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   5289                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            16                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Memory Properties:       
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    28505104(0x1b2f410) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    28505104(0x1b2f410) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    28505104(0x1b2f410) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1100                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon Graphics                
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      32(0x20) KB                        
    L2:                      2048(0x800) KB                     
  Chip ID:                 5567(0x15bf)                       
  ASIC Revision:           9(0x9)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2700                               
  BDFID:                   49920                              
  Internal Node ID:        1                                  
  Compute Unit:            12                                 
  SIMDs per CU:            2                                  
  Shader Engines:          1                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Memory Properties:       APU
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 39                                 
  SDMA engine uCode::      18                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    4194304(0x400000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    4194304(0x400000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Recommended Granule:0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1100         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***
pxl-th commented 1 month ago

What OS are you on? Also you don't need to specify ENV["HSA_OVERRIDE_GFX_VERSION"]

kalmarek commented 1 month ago

it's ubuntu 22.04 LTS with HWE

$ uname -a
Linux hp-845-g10 6.8.0-40-generic #40~22.04.3-Ubuntu SMP PREEMPT_DYNAMIC Tue Jul 30 17:30:19 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

official rocm-6.2 installation following the instructions from amd docs.

pxl-th commented 1 month ago

Hm... I'm on ROCm 6.1.2 (as well as our CI machines), let me try 6.2, maybe something has changed.

pxl-th commented 1 month ago

I just installed ROCm 6.2 on Ubuntu 22.04 and it works without issues. I used AMDGPU install script: https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/amdgpu-install.html

pxl-th commented 1 month ago

Do you have AMDGPU artifacts enabled? I see references to them in your stacktrace. If so, disable them (with code below or removing LocalPreferences.toml file) and try using your system-wide ROCm installation:

julia> AMDGPU.ROCmDiscovery.use_artifacts!(false)

We probably should remove them for now, to not confuse users, since they are quite old.

kalmarek commented 1 month ago

@pxl-th Which options did you use for amdgpu-install? I used this:

$ amdgpu-install --usecase=graphics,opencl,hip,rocm --opencl=rocr --no-32 

I can't even disable the artifacts, as simple using AMDGPU segfaults the whole julia session.

luraess commented 1 month ago

I can't even disable the artifacts, as simple using AMDGPU segfaults the whole julia session.

You could add a LocalPreferences.toml file in your working dir or project that includes the artifact info:

$ cat LocalPreferences.toml
[AMDGPU]
use_artifacts = false

and then try using AMDGPU with this.

kalmarek commented 1 month ago

that worked! thanks @luraess

pxl-th commented 1 month ago

@pxl-th Which options did you use for amdgpu-install? I used this:

$ amdgpu-install --usecase=graphics,opencl,hip,rocm --opencl=rocr --no-32 

I can't even disable the artifacts, as simple using AMDGPU segfaults the whole julia session.

Same, except for these flags: --opencl=rocr --no-32

pxl-th commented 1 month ago

Wonder why it did default to artifacts, since by default they are disabled.

pxl-th commented 1 month ago

But we should just remove them for now.

luraess commented 1 month ago

Is #673 helping here @pxl-th ?

kalmarek commented 1 month ago

Wonder why it did default to artifacts, since by default they are disabled.

It is possible that I had placed the file there as I experimented with AMDGPU on my old laptop. However this seems rather unlikely as that was at least 2 years ago (and julia-1.10 was not there yet?).