Open alfredopalhares opened 3 years ago
Can you please check which kernel version you are running (eg output of "uname -a") ? Canonical pushed a kernel update out to some 20.04.1 users bumping it to 5.8, and the current DKMS kernel will not install on 5.8.
Thank youf for you repl.
$ uname -a
Linux myhost 5.4.0-65-generic #73-Ubuntu SMP Mon Jan 18 17:25:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
So its a 5.4 Kernel. Also here are my amd modules:
$ lsmod | grep amd
amdgpu 5824512 0
amd_iommu_v2 20480 1 amdgpu
amd_sched 32768 1 amdgpu
amdttm 102400 1 amdgpu
amdkcl 24576 2 amdttm,amdgpu
i2c_algo_bit 16384 2 amdgpu,i915
drm_kms_helper 184320 2 amdgpu,i915
drm 491520 7 drm_kms_helper,amd_sched,amdttm,amdgpu,i915,amdkcl
Thank you for your help so far.
Hello? Any ideas how I can debug this further ?
There are some environment variables that enable more debug output that can help diagnose the problem. I can never remember the one in OpenCL. This one is more low-level: HSAKMT_DEBUG_LEVEL=7
@fxkamd thank you for your help, what command are your refering to?
Cliinfo has exactly the same output:
HSAKMT_DEBUG_LEVEL=7 /opt/rocm/opencl/bin/clinfo
Profiling of privileged blocks is not available.
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.0 AMD-APP (3212.0)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 0
Here is the rocminfo output:
sudo HSAKMT_DEBUG_LEVEL=7 /opt/rocm/bin/rocminfo
ROCk module is loaded
Able to open /dev/kfd read-write
=====================
HSA System Attributes
=====================
Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
==========
HSA Agents
==========
*******
Agent 1
*******
Name: Intel(R) Celeron(R) CPU G3900 @ 2.80GHz
Uuid: CPU-XX
Marketing Name: Intel(R) Celeron(R) CPU G3900 @ 2.80GHz
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2800
BDFID: 0
Internal Node ID: 0
Compute Unit: 2
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 7828164(0x7772c4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 7828164(0x7772c4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
N/A
*** Done ***
That's very strange. This should have produced some output on stderr. If not, it means it's not even getting to the low-level Thunk functions and failing somewhere higher up the stack.
Or you could check one level lower to see what KFD is reporting: ls -lR /sys/class/kfd/kfd/topology
Also check your kernel log with dmesg.
@fxkamd Thank you, here are the outputs
ls:
ls -lR /sys/class/kfd/kfd/topology
/sys/class/kfd/kfd/topology:
total 0
-r--r--r-- 1 root root 4096 Feb 8 16:29 generation_id
drwxr-xr-x 3 root root 0 Feb 8 16:29 nodes
-r--r--r-- 1 root root 4096 Feb 8 16:29 system_properties
/sys/class/kfd/kfd/topology/nodes:
total 0
drwxr-xr-x 6 root root 0 Feb 8 16:29 0
/sys/class/kfd/kfd/topology/nodes/0:
total 0
drwxr-xr-x 2 root root 0 Feb 9 12:01 caches
-r--r--r-- 1 root root 4096 Feb 8 16:29 gpu_id
drwxr-xr-x 2 root root 0 Feb 9 12:01 io_links
drwxr-xr-x 3 root root 0 Feb 8 16:29 mem_banks
-r--r--r-- 1 root root 4096 Feb 9 12:01 name
drwxr-xr-x 2 root root 0 Feb 9 12:01 perf
-r--r--r-- 1 root root 4096 Feb 8 16:29 properties
/sys/class/kfd/kfd/topology/nodes/0/caches:
total 0
/sys/class/kfd/kfd/topology/nodes/0/io_links:
total 0
/sys/class/kfd/kfd/topology/nodes/0/mem_banks:
total 0
drwxr-xr-x 2 root root 0 Feb 8 16:29 0
/sys/class/kfd/kfd/topology/nodes/0/mem_banks/0:
total 0
-r--r--r-- 1 root root 4096 Feb 8 16:29 properties
/sys/class/kfd/kfd/topology/nodes/0/perf:
total 0
ls:
dmesg | grep -i amdgpu
[ 2.734194] [drm] amdgpu kernel modesetting enabled.
[ 2.734198] [drm] amdgpu version: 5.6.19
[ 2.734261] amdgpu: CRAT table not found
[ 2.734263] amdgpu: Virtual CRAT table created for CPU
[ 2.734278] amdgpu: Topology: Add CPU node
[ 2.736844] amdgpu 0000:01:00.0: remove_conflicting_pci_framebuffers: bar 0: 0x2fe0000000 -> 0x2fefffffff
[ 2.736850] amdgpu 0000:01:00.0: remove_conflicting_pci_framebuffers: bar 2: 0x2ff0000000 -> 0x2ff01fffff
[ 2.736852] amdgpu 0000:01:00.0: remove_conflicting_pci_framebuffers: bar 5: 0xf7300000 -> 0xf733ffff
[ 2.736873] amdgpu 0000:01:00.0: enabling device (0000 -> 0003)
[ 2.737008] amdgpu 0000:01:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[ 2.737010] amdgpu 0000:01:00.0: amdgpu: set kernel compute queue number to 8 due to invalid parameter provided by user
[ 2.737052] kfd kfd: amdgpu: skipped device 1002:67df, PCI rejects atomics
[ 2.978360] amdgpu 0000:01:00.0: amdgpu: Fetched VBIOS from ROM BAR
[ 2.978365] amdgpu: ATOM BIOS: 113-BE366EU-Z46
[ 3.108250] amdgpu 0000:01:00.0: BAR 2: releasing [mem 0x2ff0000000-0x2ff01fffff 64bit pref]
[ 3.108254] amdgpu 0000:01:00.0: BAR 0: releasing [mem 0x2fe0000000-0x2fefffffff 64bit pref]
[ 3.108289] amdgpu 0000:01:00.0: BAR 0: assigned [mem 0x2000000000-0x21ffffffff 64bit pref]
[ 3.108300] amdgpu 0000:01:00.0: BAR 2: assigned [mem 0x2200000000-0x22001fffff 64bit pref]
[ 3.108337] amdgpu 0000:01:00.0: amdgpu: VRAM: 8192M 0x000000F400000000 - 0x000000F5FFFFFFFF (8192M used)
[ 3.108340] amdgpu 0000:01:00.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
[ 3.108512] [drm] amdgpu: 8192M of VRAM memory ready
[ 3.108517] [drm] amdgpu: 8192M of GTT memory ready.
[ 3.110309] amdgpu: [powerplay] hwmgr_sw_init smu backed is polaris10_smu
[ 3.313169] amdgpu 0000:01:00.0: amdgpu: SE 4, SH per SE 1, CU per SH 9, active_cu_number 36
[ 3.317160] [drm] Initialized amdgpu 3.40.0 20150101 for 0000:01:00.0 on minor 1
[ 3.317240] amdgpu 0000:05:00.0: remove_conflicting_pci_framebuffers: bar 0: 0x2fc0000000 -> 0x2fcfffffff
[ 3.317243] amdgpu 0000:05:00.0: remove_conflicting_pci_framebuffers: bar 2: 0x2fd0000000 -> 0x2fd01fffff
[ 3.317245] amdgpu 0000:05:00.0: remove_conflicting_pci_framebuffers: bar 5: 0xf7200000 -> 0xf723ffff
[ 3.317266] amdgpu 0000:05:00.0: enabling device (0000 -> 0003)
[ 3.317371] amdgpu 0000:05:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[ 3.317421] kfd kfd: amdgpu: skipped device 1002:67df, PCI rejects atomics
[ 3.573530] amdgpu 0000:05:00.0: amdgpu: Fetched VBIOS from ROM BAR
[ 3.573534] amdgpu: ATOM BIOS: 113-BE366EU-Z46
[ 3.708367] amdgpu 0000:05:00.0: BAR 2: releasing [mem 0x2fd0000000-0x2fd01fffff 64bit pref]
[ 3.708370] amdgpu 0000:05:00.0: BAR 0: releasing [mem 0x2fc0000000-0x2fcfffffff 64bit pref]
[ 3.708421] amdgpu 0000:05:00.0: BAR 0: no space for [mem size 0x200000000 64bit pref]
[ 3.708424] amdgpu 0000:05:00.0: BAR 0: failed to assign [mem size 0x200000000 64bit pref]
[ 3.708427] amdgpu 0000:05:00.0: BAR 2: no space for [mem size 0x00200000 64bit pref]
[ 3.708429] amdgpu 0000:05:00.0: BAR 2: failed to assign [mem size 0x00200000 64bit pref]
[ 3.708486] amdgpu 0000:05:00.0: BAR 0: assigned [mem 0x2fc0000000-0x2fcfffffff 64bit pref]
[ 3.708502] amdgpu 0000:05:00.0: BAR 2: assigned [mem 0x2fd0000000-0x2fd01fffff 64bit pref]
[ 3.708529] amdgpu 0000:05:00.0: amdgpu: VRAM: 8192M 0x000000F400000000 - 0x000000F5FFFFFFFF (8192M used)
[ 3.708531] amdgpu 0000:05:00.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
[ 3.708563] [drm] amdgpu: 8192M of VRAM memory ready
[ 3.708566] [drm] amdgpu: 8192M of GTT memory ready.
[ 3.710458] amdgpu: [powerplay] hwmgr_sw_init smu backed is polaris10_smu
[ 3.908544] amdgpu 0000:05:00.0: amdgpu: SE 4, SH per SE 1, CU per SH 9, active_cu_number 36
[ 3.912299] [drm] Initialized amdgpu 3.40.0 20150101 for 0000:05:00.0 on minor 2
[ 3.912378] amdgpu 0000:08:00.0: remove_conflicting_pci_framebuffers: bar 0: 0x2fa0000000 -> 0x2fafffffff
[ 3.912381] amdgpu 0000:08:00.0: remove_conflicting_pci_framebuffers: bar 2: 0x2fb0000000 -> 0x2fb01fffff
[ 3.912384] amdgpu 0000:08:00.0: remove_conflicting_pci_framebuffers: bar 5: 0xf7100000 -> 0xf713ffff
[ 3.912406] amdgpu 0000:08:00.0: enabling device (0000 -> 0003)
[ 3.912513] amdgpu 0000:08:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[ 3.912562] kfd kfd: amdgpu: skipped device 1002:67df, PCI rejects atomics
[ 4.174650] amdgpu 0000:08:00.0: amdgpu: Fetched VBIOS from ROM BAR
[ 4.174654] amdgpu: ATOM BIOS: 113-BE366EU-Z46
[ 4.304406] amdgpu 0000:08:00.0: BAR 2: releasing [mem 0x2fb0000000-0x2fb01fffff 64bit pref]
[ 4.304409] amdgpu 0000:08:00.0: BAR 0: releasing [mem 0x2fa0000000-0x2fafffffff 64bit pref]
[ 4.304464] amdgpu 0000:08:00.0: BAR 0: no space for [mem size 0x200000000 64bit pref]
[ 4.304466] amdgpu 0000:08:00.0: BAR 0: failed to assign [mem size 0x200000000 64bit pref]
[ 4.304469] amdgpu 0000:08:00.0: BAR 2: no space for [mem size 0x00200000 64bit pref]
[ 4.304471] amdgpu 0000:08:00.0: BAR 2: failed to assign [mem size 0x00200000 64bit pref]
[ 4.304529] amdgpu 0000:08:00.0: BAR 0: assigned [mem 0x2fa0000000-0x2fafffffff 64bit pref]
[ 4.304547] amdgpu 0000:08:00.0: BAR 2: assigned [mem 0x2fb0000000-0x2fb01fffff 64bit pref]
[ 4.304572] amdgpu 0000:08:00.0: amdgpu: VRAM: 8192M 0x000000F400000000 - 0x000000F5FFFFFFFF (8192M used)
[ 4.304574] amdgpu 0000:08:00.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
[ 4.304601] [drm] amdgpu: 8192M of VRAM memory ready
[ 4.304605] [drm] amdgpu: 8192M of GTT memory ready.
[ 4.306839] amdgpu: [powerplay] hwmgr_sw_init smu backed is polaris10_smu
[ 4.507827] amdgpu 0000:08:00.0: amdgpu: SE 4, SH per SE 1, CU per SH 9, active_cu_number 36
[ 4.511415] [drm] Initialized amdgpu 3.40.0 20150101 for 0000:08:00.0 on minor 3
[ 4.511491] amdgpu 0000:09:00.0: remove_conflicting_pci_framebuffers: bar 0: 0x2f80000000 -> 0x2f8fffffff
[ 4.511494] amdgpu 0000:09:00.0: remove_conflicting_pci_framebuffers: bar 2: 0x2f90000000 -> 0x2f901fffff
[ 4.511497] amdgpu 0000:09:00.0: remove_conflicting_pci_framebuffers: bar 5: 0xf7000000 -> 0xf703ffff
[ 4.511520] amdgpu 0000:09:00.0: enabling device (0000 -> 0003)
[ 4.511626] amdgpu 0000:09:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[ 4.511673] kfd kfd: amdgpu: skipped device 1002:67df, PCI rejects atomics
[ 4.767793] amdgpu 0000:09:00.0: amdgpu: Fetched VBIOS from ROM BAR
[ 4.767796] amdgpu: ATOM BIOS: 113-BE366EU-Z46
[ 4.900318] amdgpu 0000:09:00.0: BAR 2: releasing [mem 0x2f90000000-0x2f901fffff 64bit pref]
[ 4.900321] amdgpu 0000:09:00.0: BAR 0: releasing [mem 0x2f80000000-0x2f8fffffff 64bit pref]
[ 4.900373] amdgpu 0000:09:00.0: BAR 0: no space for [mem size 0x200000000 64bit pref]
[ 4.900375] amdgpu 0000:09:00.0: BAR 0: failed to assign [mem size 0x200000000 64bit pref]
[ 4.900378] amdgpu 0000:09:00.0: BAR 2: no space for [mem size 0x00200000 64bit pref]
[ 4.900380] amdgpu 0000:09:00.0: BAR 2: failed to assign [mem size 0x00200000 64bit pref]
[ 4.900438] amdgpu 0000:09:00.0: BAR 0: assigned [mem 0x2f80000000-0x2f8fffffff 64bit pref]
[ 4.900454] amdgpu 0000:09:00.0: BAR 2: assigned [mem 0x2f90000000-0x2f901fffff 64bit pref]
[ 4.900477] amdgpu 0000:09:00.0: amdgpu: VRAM: 8192M 0x000000F400000000 - 0x000000F5FFFFFFFF (8192M used)
[ 4.900480] amdgpu 0000:09:00.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
[ 4.900507] [drm] amdgpu: 8192M of VRAM memory ready
[ 4.900510] [drm] amdgpu: 8192M of GTT memory ready.
[ 4.902337] amdgpu: [powerplay] hwmgr_sw_init smu backed is polaris10_smu
[ 5.100284] amdgpu 0000:09:00.0: amdgpu: SE 4, SH per SE 1, CU per SH 9, active_cu_number 36
[ 5.104236] [drm] Initialized amdgpu 3.40.0 20150101 for 0000:09:00.0 on minor 4
[ 8.234236] snd_hda_intel 0000:01:00.1: bound 0000:01:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
[ 8.235903] snd_hda_intel 0000:05:00.1: bound 0000:05:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
[ 8.238932] snd_hda_intel 0000:08:00.1: bound 0000:08:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
[ 8.241749] snd_hda_intel 0000:09:00.1: bound 0000:09:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
Here is your problem: [ 3.317421] kfd kfd: amdgpu: skipped device 1002:67df, PCI rejects atomics
KFD (more specifically the AQL firmware) on your Ellesmere GPUs requires PCIe v3 with support for atomic operations. Your mainboard doesn't seem to support that. On some mainboards, some slots support PCIe v3, while others do not. You could try experimenting with that. But given the age of your CPU, your chipset may just not have PCIe v3 support at all.
I did tryied on every slot. My board is an ASRock H110 Pro BTC +
Even amdpro drivers do not work.
I looked up the spec or your main board: https://www.asrock.com/mb/Intel/H110%20Pro%20BTC+/index.asp "1 PCIe 3.0 x16, 12 PCIe 2.0 x1". Only the single x16 slot supports PCIe 3. The remaining 12 x1 slots only support PCIe 2. If you plug in one of the Ellesmere cards into the x16 slots, it should work with ROCm. Unfortunately the x1 slots on your board are not suitable for ROCm.
The AMD Linux Pro driver should work though. As far as I know, it uses a legacy OpenCL implementation that does not depend on ROCm. In the 20.45 release it moved to a ROCm-based OpenCL implementation, but only for Vega and later GPUs.
Hello, I have 4 AMD 480
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev e7) 05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev e7) 08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev e7) 09:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev e7)
On Ubuntu 20.04 all update, followed the install intructions But cliinfo does not detect any cards:
/opt/rocm/opencl/bin/clinfo Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 2.0 AMD-APP (3212.0) Platform Name: AMD Accelerated Parallel Processing Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callback Platform Name: AMD Accelerated Parallel Processing Number of devices: 0
No OpenCL applications work, no devices are detected. Can you help me diagnose this problem ?
Hi @alfredopalhares,
I was having the same issue and the mesa-opencl-icd
package fixed it for me. I'm using ROCm 4.2 on Ubuntu 18.04.5 LTS. The package seems to be available for 20.04 as well.
sudo apt install mesa-opencl-icd
Hope it works out for you. All the best!
Hello, I have 4 AMD 480
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev e7) 05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev e7) 08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev e7) 09:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev e7)
On Ubuntu 20.04 all update, followed the install intructions But cliinfo does not detect any cards:
/opt/rocm/opencl/bin/clinfo Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 2.0 AMD-APP (3212.0) Platform Name: AMD Accelerated Parallel Processing Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callback Platform Name: AMD Accelerated Parallel Processing Number of devices: 0
No OpenCL applications work, no devices are detected. Can you help me diagnose this problem ?
Hi @alfredopalhares,
I was having the same issue and the
mesa-opencl-icd
package fixed it for me. I'm using ROCm 4.2 on Ubuntu 18.04.5 LTS. The package seems to be available for 20.04 as well.sudo apt install mesa-opencl-icd
Hope it works out for you. All the best!
Further investigated this. The command is referring to the mesa version(mesa.icd
) located at /etc/OpenCL/vendors/
. That isn't supposed to happen. I've done a clean install of ROCm 4.2. https://github.com/RadeonOpenCompute/ROCm/issues/511 seems to be a good reference to narrow out a solution.
@all ,
I have a similar problem. When I ran:
/opt/rocm/opencl/bin/clinfo
I got the output:
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.0 AMD-APP (3275.0)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 0
When ran:
sudo /opt/rocm/bin/rocminfo
Got the output:
ROCk module is loaded
HSA Error: Incompatible kernel and userspace, Vega 20 disabled. Upgrade amdgpu.
HSA Error: Incompatible kernel and userspace, Vega 20 disabled. Upgrade amdgpu.
=====================
HSA System Attributes
=====================
Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD Ryzen 7 2700X Eight-Core Processor
Uuid: CPU-XX
Marketing Name: AMD Ryzen 7 2700X Eight-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 3700
BDFID: 0
Internal Node ID: 0
Compute Unit: 16
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 32871168(0x1f59300) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 32871168(0x1f59300) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
N/A
*** Done ***
But factually there are two AMD gpus on my computer. (When ran:
lsmod | grep amd
Got output:
edac_mce_amd 32768 0
amdgpu 4579328 41
amd_iommu_v2 20480 1 amdgpu
gpu_sched 32768 1 amdgpu
ttm 106496 1 amdgpu
drm_kms_helper 180224 1 amdgpu
drm 487424 22 gpu_sched,drm_kms_helper,amdgpu,ttm
i2c_algo_bit 16384 2 igb,amdgpu
gpio_amdpt 20480 0
gpio_generic 20480 1 gpio_amdpt
When ran:
dmesg | grep -i amdgpu
Got output:
[ 1.196753] [drm] amdgpu kernel modesetting enabled.
[ 1.196907] amdgpu 0000:0a:00.0: remove_conflicting_pci_framebuffers: bar 0: 0xe0000000 -> 0xefffffff
[ 1.196908] amdgpu 0000:0a:00.0: remove_conflicting_pci_framebuffers: bar 2: 0xf0000000 -> 0xf01fffff
[ 1.196909] amdgpu 0000:0a:00.0: remove_conflicting_pci_framebuffers: bar 5: 0xfcd00000 -> 0xfcd7ffff
[ 1.196911] fb0: switching to amdgpudrmfb from VESA VGA
[ 1.196987] amdgpu 0000:0a:00.0: vgaarb: deactivate vga console
[ 1.197186] amdgpu 0000:0a:00.0: No more image in the PCI ROM
[ 1.197265] amdgpu 0000:0a:00.0: VRAM: 16368M 0x0000008000000000 - 0x00000083FEFFFFFF (16368M used)
[ 1.197266] amdgpu 0000:0a:00.0: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[ 1.197267] amdgpu 0000:0a:00.0: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF
[ 1.197342] [drm] amdgpu: 16368M of VRAM memory ready
[ 1.197344] [drm] amdgpu: 16368M of GTT memory ready.
[ 1.197652] amdgpu 0000:0a:00.0: Direct firmware load for amdgpu/vega20_ta.bin failed with error -2
[ 1.197653] amdgpu 0000:0a:00.0: psp v11.0: Failed to load firmware "amdgpu/vega20_ta.bin"
[ 1.199147] amdgpu: [powerplay] hwmgr_sw_init smu backed is vega20_smu
[ 2.105430] fbcon: amdgpudrmfb (fb0) is primary device
[ 2.127315] amdgpu 0000:0a:00.0: fb0: amdgpudrmfb frame buffer device
[ 2.133059] amdgpu 0000:0a:00.0: ring gfx uses VM inv eng 0 on hub 0
[ 2.133060] amdgpu 0000:0a:00.0: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ 2.133061] amdgpu 0000:0a:00.0: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ 2.133061] amdgpu 0000:0a:00.0: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[ 2.133062] amdgpu 0000:0a:00.0: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[ 2.133063] amdgpu 0000:0a:00.0: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[ 2.133063] amdgpu 0000:0a:00.0: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[ 2.133064] amdgpu 0000:0a:00.0: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[ 2.133065] amdgpu 0000:0a:00.0: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[ 2.133065] amdgpu 0000:0a:00.0: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[ 2.133066] amdgpu 0000:0a:00.0: ring sdma0 uses VM inv eng 0 on hub 1
[ 2.133067] amdgpu 0000:0a:00.0: ring page0 uses VM inv eng 1 on hub 1
[ 2.133067] amdgpu 0000:0a:00.0: ring sdma1 uses VM inv eng 4 on hub 1
[ 2.133068] amdgpu 0000:0a:00.0: ring page1 uses VM inv eng 5 on hub 1
[ 2.133068] amdgpu 0000:0a:00.0: ring uvd_0 uses VM inv eng 6 on hub 1
[ 2.133069] amdgpu 0000:0a:00.0: ring uvd_enc_0.0 uses VM inv eng 7 on hub 1
[ 2.133069] amdgpu 0000:0a:00.0: ring uvd_enc_0.1 uses VM inv eng 8 on hub 1
[ 2.133070] amdgpu 0000:0a:00.0: ring uvd_1 uses VM inv eng 9 on hub 1
[ 2.133071] amdgpu 0000:0a:00.0: ring uvd_enc_1.0 uses VM inv eng 10 on hub 1
[ 2.133071] amdgpu 0000:0a:00.0: ring uvd_enc_1.1 uses VM inv eng 11 on hub 1
[ 2.133072] amdgpu 0000:0a:00.0: ring vce0 uses VM inv eng 12 on hub 1
[ 2.133072] amdgpu 0000:0a:00.0: ring vce1 uses VM inv eng 13 on hub 1
[ 2.133073] amdgpu 0000:0a:00.0: ring vce2 uses VM inv eng 14 on hub 1
[ 2.133423] Detected AMDGPU DF Counters. # of Counters = 4.
[ 2.133440] [drm] Initialized amdgpu 3.35.0 20150101 for 0000:0a:00.0 on minor 0
[ 2.133457] amdgpu 0000:0d:00.0: remove_conflicting_pci_framebuffers: bar 0: 0xc0000000 -> 0xcfffffff
[ 2.133457] amdgpu 0000:0d:00.0: remove_conflicting_pci_framebuffers: bar 2: 0xd0000000 -> 0xd01fffff
[ 2.133458] amdgpu 0000:0d:00.0: remove_conflicting_pci_framebuffers: bar 5: 0xfcb00000 -> 0xfcb7ffff
[ 2.133467] amdgpu 0000:0d:00.0: enabling device (0000 -> 0003)
[ 2.232072] amdgpu 0000:0d:00.0: VRAM: 16368M 0x0000008000000000 - 0x00000083FEFFFFFF (16368M used)
[ 2.232073] amdgpu 0000:0d:00.0: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[ 2.232074] amdgpu 0000:0d:00.0: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF
[ 2.232089] [drm] amdgpu: 16368M of VRAM memory ready
[ 2.232090] [drm] amdgpu: 16368M of GTT memory ready.
[ 2.232323] amdgpu 0000:0d:00.0: Direct firmware load for amdgpu/vega20_ta.bin failed with error -2
[ 2.232324] amdgpu 0000:0d:00.0: psp v11.0: Failed to load firmware "amdgpu/vega20_ta.bin"
[ 2.233892] amdgpu: [powerplay] hwmgr_sw_init smu backed is vega20_smu
[ 3.034037] amdgpu 0000:0d:00.0: ring gfx uses VM inv eng 0 on hub 0
[ 3.034038] amdgpu 0000:0d:00.0: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ 3.034038] amdgpu 0000:0d:00.0: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ 3.034039] amdgpu 0000:0d:00.0: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[ 3.034040] amdgpu 0000:0d:00.0: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[ 3.034040] amdgpu 0000:0d:00.0: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[ 3.034041] amdgpu 0000:0d:00.0: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[ 3.034041] amdgpu 0000:0d:00.0: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[ 3.034042] amdgpu 0000:0d:00.0: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[ 3.034042] amdgpu 0000:0d:00.0: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[ 3.034043] amdgpu 0000:0d:00.0: ring sdma0 uses VM inv eng 0 on hub 1
[ 3.034044] amdgpu 0000:0d:00.0: ring page0 uses VM inv eng 1 on hub 1
[ 3.034045] amdgpu 0000:0d:00.0: ring sdma1 uses VM inv eng 4 on hub 1
[ 3.034045] amdgpu 0000:0d:00.0: ring page1 uses VM inv eng 5 on hub 1
[ 3.034046] amdgpu 0000:0d:00.0: ring uvd_0 uses VM inv eng 6 on hub 1
[ 3.034046] amdgpu 0000:0d:00.0: ring uvd_enc_0.0 uses VM inv eng 7 on hub 1
[ 3.034047] amdgpu 0000:0d:00.0: ring uvd_enc_0.1 uses VM inv eng 8 on hub 1
[ 3.034048] amdgpu 0000:0d:00.0: ring uvd_1 uses VM inv eng 9 on hub 1
[ 3.034048] amdgpu 0000:0d:00.0: ring uvd_enc_1.0 uses VM inv eng 10 on hub 1
[ 3.034049] amdgpu 0000:0d:00.0: ring uvd_enc_1.1 uses VM inv eng 11 on hub 1
[ 3.034050] amdgpu 0000:0d:00.0: ring vce0 uses VM inv eng 12 on hub 1
[ 3.034051] amdgpu 0000:0d:00.0: ring vce1 uses VM inv eng 13 on hub 1
[ 3.034051] amdgpu 0000:0d:00.0: ring vce2 uses VM inv eng 14 on hub 1
[ 3.034550] Detected AMDGPU DF Counters. # of Counters = 4.
[ 3.034567] [drm] Initialized amdgpu 3.35.0 20150101 for 0000:0d:00.0 on minor 1
[ 11.286234] snd_hda_intel 0000:0d:00.1: bound 0000:0d:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
[ 11.287645] snd_hda_intel 0000:0a:00.1: bound 0000:0a:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
What could be the cause and how to solve it?
I did tryied on every slot. My board is an ASRock H110 Pro BTC +
I have exactly same situation with all driver versions.
I have the same problem, since 4.5 my gpu is no longer detected with OpenCL
Same issue here. I am using an RX 580 on Ubuntu 20.04 This is my clinfo output:
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.2 AMD-APP (3361.0)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 0
And this is my rocminfo output:
ROCk module is loaded
hsa api call failure at: /long_pathname_so_that_rpms_can_package_the_debug_info/src/rocminfo/rocminfo.cc:1143
Call returned HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources. This error may also occur when the core runtime library needs to spawn threads or create internal OS-specific events.
I installed the drivers using this: https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html#installing-a-rocm-package-from-a-debian-repository
I have the same issue, on rocm 4.5.2, as well as all the earlier versions i've tried:
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.2 AMD-APP (3361.0)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 0
Motherboard is set to use PCIe 3.0 on the slot of the 5700xt gpu. Motherboard is P8Z77-V DELUXE.
From dmesg: kfd kfd: amdgpu: skipped device 1102:731f, PCI rejects atomics 142<145
Same issue here
uname -a
Linux test-7 5.15.85-1-pve #1 SMP PVE 5.15.85-1
(2023-02-01T00:00Z) x86_64 x86_64 x86_64 GNU/Linux
lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04 LTS
Release: 20.04
Codename: focal
clinfo
Number of platforms 1
Platform Name AMD Accelerated Parallel Processing
Platform Vendor Advanced Micro Devices, Inc.
Platform Version OpenCL 2.1 AMD-APP (3513.0)
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_amd_event_callback
Platform Host timer resolution 1ns
Platform Extensions function suffix AMD
Platform Name AMD Accelerated Parallel Processing
Number of devices 0
Hello Is there any solution for 20.04 ubuntu kernel 5.15?
Number of platforms 1
Platform Name AMD Accelerated Parallel Processing
Platform Vendor Advanced Micro Devices, Inc.
Platform Version OpenCL 2.1 AMD-APP (3486.0)
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_amd_event_callback
Platform Host timer resolution 1ns
Platform Extensions function suffix AMD
Platform Name AMD Accelerated Parallel Processing
Number of devices 0
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform
clCreateContext(NULL, ...) [default] No platform
clCreateContext(NULL, ...) [other] No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No devices found in platform
----------------------------------------------------
tried installing amd drivers some installers work other result in error the last attempt which outputs are listed above were rendered with
sudo dpkg -i amdgpu-install_22.20.50200-1_all.deb
Hello, I have 4 AMD 480
On Ubuntu 20.04 all update, followed the install intructions But cliinfo does not detect any cards:
No OpenCL applications work, no devices are detected. Can you help me diagnose this problem ?