ewagner12 / all-ways-egpu

Configure eGPU as primary under Linux Wayland desktops
MIT License
179 stars 12 forks source link

Blackmagic Design eGPU RX580, Ubuntu on mac pro. #9

Open kimasplund opened 1 year ago

kimasplund commented 1 year ago

Did anyone get this thing working? or is it just an expensive paperweight? Seems it even dont work on macos anymore other than as a usb hub.

The internal gpu is detected just fine...

$ sudo dmesg | grep -i amdgpu [ 7.505654] [drm] amdgpu kernel modesetting enabled. [ 7.505888] amdgpu: CRAT table not found [ 7.505890] amdgpu: Virtual CRAT table created for CPU [ 7.505903] amdgpu: Topology: Add CPU node [ 7.506153] amdgpu 0000:01:00.0: amdgpu: Fetched VBIOS from VFCT [ 7.506155] amdgpu: ATOM BIOS: 113-C97501U-005 [ 7.506437] amdgpu 0000:01:00.0: vgaarb: deactivate vga console [ 7.506439] amdgpu 0000:01:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported [ 7.518821] amdgpu 0000:01:00.0: BAR 2: releasing [mem 0xc0000000-0xc01fffff 64bit pref] [ 7.518825] amdgpu 0000:01:00.0: BAR 0: releasing [mem 0xb0000000-0xbfffffff 64bit pref] [ 7.518831] amdgpu 0000:01:00.0: BAR 0: assigned [mem 0xb0000000-0xbfffffff 64bit pref] [ 7.518864] amdgpu 0000:01:00.0: BAR 2: assigned [mem 0xc0000000-0xc01fffff 64bit pref] [ 7.518908] amdgpu 0000:01:00.0: amdgpu: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used) [ 7.518912] amdgpu 0000:01:00.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF [ 7.519012] [drm] amdgpu: 4096M of VRAM memory ready [ 7.519013] [drm] amdgpu: 7938M of GTT memory ready. [ 7.537764] amdgpu: hwmgr_sw_init smu backed is polaris10_smu [ 7.612729] amdgpu: Voltage value looks like a Leakage ID but it's not patched [ 7.612731] amdgpu: Voltage value looks like a Leakage ID but it's not patched [ 7.612732] amdgpu: Voltage value looks like a Leakage ID but it's not patched [ 7.645877] snd_hda_intel 0000:01:00.1: bound 0000:01:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu]) [ 7.705638] amdgpu 0000:01:00.0: amdgpu: [drm] Skipping amdgpu DM backlight registration [ 7.865233] kfd kfd: amdgpu: Allocated 3969056 bytes on gart [ 7.865280] amdgpu: sdma_bitmap: f [ 7.865299] amdgpu: SRAT table not found [ 7.865300] amdgpu: Virtual CRAT table created for GPU [ 7.865349] amdgpu: Topology: Add dGPU node [0x67ef:0x1002] [ 7.865351] kfd kfd: amdgpu: added device 1002:67ef [ 7.865360] amdgpu 0000:01:00.0: amdgpu: SE 2, SH per SE 1, CU per SH 8, active_cu_number 16 [ 7.869938] [drm] Initialized amdgpu 3.49.0 20150101 for 0000:01:00.0 on minor 0 [ 7.879929] fbcon: amdgpudrmfb (fb0) is primary device [ 8.565262] amdgpu 0000:01:00.0: [drm] fb0: amdgpudrmfb frame buffer device

And boltctl find the egpu.

$ boltctl ● Blackmagic Design eGPU RX580 ├─ type: peripheral ├─ name: eGPU RX580 ├─ vendor: Blackmagic Design ├─ uuid: c6030000-0062-640e-83d7-1b861ca51802 ├─ generation: Thunderbolt 3 ├─ status: authorized │ ├─ domain: c4010000-0070-7c0e-8307-e20e82822021 │ ├─ rx speed: 20 Gb/s = 2 lanes 10 Gb/s │ ├─ tx speed: 20 Gb/s = 2 lanes 10 Gb/s │ └─ authflags: none ├─ authorized: Mon 27 Mar 2023 11:06:27 UTC ├─ connected: Mon 27 Mar 2023 11:06:20 UTC └─ stored: Sun 26 Mar 2023 16:09:41 UTC ├─ policy: auto └─ key: no

But there it seems to end...

Any ideas what to do? or just put it on the garbage heap and call it a day?

ewagner12 commented 1 year ago

@kimasplund 1002:67ef is the device ID for an RX 460, not an RX 580. I'm guessing someone either replaced the internal 580 with a dead 460, or they tried to flash an RX 460 VBIOS onto the 580. I would try flashing the original vbios here's a copy and see if that fixes it. I believe Igor's lab has an amdvbflash tool for linux, otherwise there's an official bios flashing tool that only works on windows.

kimasplund commented 1 year ago

Hey. Il try to reflash it and see if it sorts things out. But i i guess the RX 460 id that comes up there is the integrated graphics on my macbook. this comes up when egpu is not connected.

sudo ./amdvbflash -i AMDVBFLASH version 4.71, Copyright (c) 2020 Advanced Micro Devices, Inc.

adapter seg bn dn dID asic flash romsize test bios p/n
======= ==== == == ==== =============== ============== ======= ==== ================ 0 0000 01 00 67EF Polaris11 R600 SPI 10000 fail -

il check with the egpu again later today when i get back to my office.

kimasplund commented 1 year ago

Soo lets see...

with the egpu plugged in

im really not sure what to do about this anymore :D any suggestions?

$ lspci
00:00.0 Host bridge: Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers (rev 07)
00:01.0 PCI bridge: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) (rev 07)
00:01.1 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x8) (rev 07)
00:01.2 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x4) (rev 07)
00:12.0 Signal processing controller: Intel Corporation Cannon Lake PCH Thermal Controller (rev 10)
00:14.0 USB controller: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller (rev 10)
00:14.2 RAM memory: Intel Corporation Cannon Lake PCH Shared SRAM (rev 10)
00:16.0 Communication controller: Intel Corporation Cannon Lake PCH HECI Controller (rev 10)
00:1b.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #17 (rev f0)
00:1c.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #1 (rev f0)
00:1e.0 Communication controller: Intel Corporation Cannon Lake PCH Serial IO UART Host Controller (rev 10)
00:1f.0 ISA bridge: Intel Corporation Device a313 (rev 10)
00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
00:1f.5 Serial bus controller: Intel Corporation Cannon Lake PCH SPI Controller (rev 10)
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon RX 460/560D / Pro 450/455/460/555/555X/560/560X] (rev c2)
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Baffin HDMI/DP Audio [Radeon RX 550 640SP / RX 560/560X]
02:00.0 Mass storage controller: Apple Inc. ANS2 NVMe Controller (rev 01)
02:00.1 Non-VGA unclassified device: Apple Inc. T2 Bridge Controller (rev 01)
02:00.2 Non-VGA unclassified device: Apple Inc. T2 Secure Enclave Processor (rev 01)
02:00.3 Multimedia audio controller: Apple Inc. Apple Audio Device (rev 01)
03:00.0 Network controller: Broadcom Inc. and subsidiaries BCM4364 802.11ac Wireless Network Adapter (rev 03)
04:00.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] (rev 06)
05:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
05:01.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
05:02.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
05:04.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
06:00.0 System peripheral: Intel Corporation JHL7540 Thunderbolt 3 NHI [Titan Ridge 4C 2018] (rev 06)
07:00.0 USB controller: Intel Corporation JHL7540 Thunderbolt 3 USB Controller [Titan Ridge 4C 2018] (rev 06)
7a:00.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] (rev 06)
7b:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
7b:01.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
7b:02.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
7b:04.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
7c:00.0 System peripheral: Intel Corporation JHL7540 Thunderbolt 3 NHI [Titan Ridge 4C 2018] (rev 06)
7d:00.0 USB controller: Intel Corporation JHL7540 Thunderbolt 3 USB Controller [Titan Ridge 4C 2018] (rev 06)
$ boltctl 
 ● Blackmagic Design eGPU RX580
   ├─ type:          peripheral
   ├─ name:          eGPU RX580
   ├─ vendor:        Blackmagic Design
   ├─ uuid:          c6030000-0062-640e-83d7-1b861ca51802
   ├─ generation:    Thunderbolt 3
   ├─ status:        authorized
   │  ├─ domain:     c4010000-0070-7c0e-8307-e20e82822021
   │  ├─ rx speed:   20 Gb/s = 2 lanes * 10 Gb/s
   │  ├─ tx speed:   20 Gb/s = 2 lanes * 10 Gb/s
   │  └─ authflags:  none
   ├─ authorized:    Tue 28 Mar 2023 07:55:33 UTC
   ├─ connected:     Thu 01 Jan 1970 02:35:33 UTC
   └─ stored:        Sun 26 Mar 2023 16:09:41 UTC
      ├─ policy:     auto
      └─ key:        no
sudo dmesg | grep -i amd
[    0.000000]   AMD AuthenticAMD
[    0.028634] RAMDISK: [mem 0x628a3000-0x66e46fff]
[    0.028693] ACPI: VFCT 0x000000007AFBA000 00F284 (v01 APPLE  Apple00  00000001 AMD  31504F47)
[    5.982280] AMD-Vi: AMD IOMMUv2 functionality not available on this system - This is not a bug.
[    7.520371] [drm] amdgpu kernel modesetting enabled.
[    7.520532] amdgpu: CRAT table not found
[    7.520535] amdgpu: Virtual CRAT table created for CPU
[    7.520544] amdgpu: Topology: Add CPU node
[    7.520750] amdgpu 0000:01:00.0: amdgpu: Fetched VBIOS from VFCT
[    7.520753] amdgpu: ATOM BIOS: 113-C97501U-005
[    7.520992] amdgpu 0000:01:00.0: vgaarb: deactivate vga console
[    7.520995] amdgpu 0000:01:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[    7.537488] amdgpu 0000:01:00.0: BAR 2: releasing [mem 0xc0000000-0xc01fffff 64bit pref]
[    7.537506] amdgpu 0000:01:00.0: BAR 0: releasing [mem 0xb0000000-0xbfffffff 64bit pref]
[    7.537512] amdgpu 0000:01:00.0: BAR 0: assigned [mem 0xb0000000-0xbfffffff 64bit pref]
[    7.537541] amdgpu 0000:01:00.0: BAR 2: assigned [mem 0xc0000000-0xc01fffff 64bit pref]
[    7.537556] amdgpu 0000:01:00.0: amdgpu: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[    7.537558] amdgpu 0000:01:00.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
[    7.537607] [drm] amdgpu: 4096M of VRAM memory ready
[    7.537609] [drm] amdgpu: 7938M of GTT memory ready.
[    7.559287] amdgpu: hwmgr_sw_init smu backed is polaris10_smu
[    7.630046] amdgpu: Voltage value looks like a Leakage ID but it's not patched
[    7.630048] amdgpu: Voltage value looks like a Leakage ID but it's not patched
[    7.630049] amdgpu: Voltage value looks like a Leakage ID but it's not patched
[    7.661685] snd_hda_intel 0000:01:00.1: bound 0000:01:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
[    7.726651] amdgpu 0000:01:00.0: amdgpu: [drm] Skipping amdgpu DM backlight registration
[    7.886152] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[    7.886216] amdgpu: sdma_bitmap: f
[    7.886248] amdgpu: SRAT table not found
[    7.886249] amdgpu: Virtual CRAT table created for GPU
[    7.886294] amdgpu: Topology: Add dGPU node [0x67ef:0x1002]
[    7.886295] kfd kfd: amdgpu: added device 1002:67ef
[    7.886306] amdgpu 0000:01:00.0: amdgpu: SE 2, SH per SE 1, CU per SH 8, active_cu_number 16
[    7.891014] [drm] Initialized amdgpu 3.49.0 20150101 for 0000:01:00.0 on minor 0
[    7.901048] fbcon: amdgpudrmfb (fb0) is primary device
[    8.563757] amdgpu 0000:01:00.0: [drm] fb0: amdgpudrmfb frame buffer device
sudo ./amdvbflash -i
AMDVBFLASH version 4.71, Copyright (c) 2020 Advanced Micro Devices, Inc.

adapter seg  bn dn dID       asic           flash      romsize test    bios p/n    
======= ==== == == ==== =============== ============== ======= ==== ================
   0    0000 01 00 67EF Polaris11       R600 SPI         10000 fail       -
$ lsmod | grep amd
amdgpu              14204928  26
iommu_v2               24576  1 amdgpu
drm_buddy              20480  1 amdgpu
gpu_sched              61440  1 amdgpu
drm_ttm_helper         16384  1 amdgpu
ttm                   106496  2 amdgpu,drm_ttm_helper
drm_display_helper    208896  1 amdgpu
drm_kms_helper        241664  4 drm_display_helper,amdgpu
i2c_algo_bit           20480  1 amdgpu
drm                   684032  15 gpu_sched,drm_kms_helper,drm_display_helper,drm_buddy,amdgpu,drm_ttm_helper,ttm
video                  69632  1 amdgpu