Closed Francesco149 closed 3 years ago
Kernel 4.18 does not have the new SI firmware paths patched into it AFAIK. You still need either the radeon firmware from AMDGPU-PRO or you need to copy the pitcairn/GFX6 firmware files from /lib/firmware/amdgpu to /lib/firmware/radeon. Linux 4.19 has the new SI paths and so doesn't need any modification
hmm, i just tried both the 4.19-rc2 release and the latest git kernel (rc2 should have the paths fix you mention though) and I still get the segfault but it's slightly different.
if i run vulkaninfo with gdb I get what looks to be the same segfault as before:
Starting program: /usr/bin/env VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/amd_icd64.json vulkaninfo
process 783 is executing new program: /usr/bin/vulkaninfo
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
==========
VULKANINFO
==========
Vulkan Instance Version: 1.1.82
(gdb) AMD-PAL: Warn: Unconditional Alert | Reason: Unknown (/home/loli/aur/amdvlk-git/src/pal/inc/util/palMemTrackerImpl.h:76:~MemTracker)
AMD-PAL: Warn: ================ List of Leaked Blocks ================ (/home/loli/aur/amdvlk-git/src/pal/inc/util/palMemTrackerImpl.h:338:MemoryReport)
AMD-PAL: Warn: ClientMem = 0x0x555555619ec0, AllocSize = 1424, MemBlkType = New, File = /home/loli/aur/amdvlk-git/src/pal/src/core/os/lnx/lnxVamMgr.cpp, LineNumber = 431, AllocNum = 1 (/home/loli/aur/amdvlk-git/src/pal/inc/util/palMemTrackerImpl.h:349:MemoryReport)
AMD-PAL: Warn: ================ End of List =========================== (/home/loli/aur/amdvlk-git/src/pal/inc/util/palMemTrackerImpl.h:352:MemoryReport)
AMD-PAL: Warn: Unconditional Alert | Reason: Unknown (/home/loli/aur/amdvlk-git/src/pal/src/core/os/lnx/lnxDevice.cpp:604:EarlyInit)
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff5372866 in Pal::Device::HwlEarlyInit (
this=this@entry=0x5555556219a0)
at /home/loli/aur/amdvlk-git/src/pal/src/core/device.cpp:485
485 pfnTable.pfnCreateFmaskViewSrds = &DefaultCreateFmaskViewSrds;
(gdb) bt
#0 0x00007ffff5372866 in Pal::Device::HwlEarlyInit (
this=this@entry=0x5555556219a0)
at /home/loli/aur/amdvlk-git/src/pal/src/core/device.cpp:485
#1 0x00007ffff5372be8 in Pal::Device::EarlyInit (
this=this@entry=0x5555556219a0, ipLevels=...)
at /home/loli/aur/amdvlk-git/src/pal/src/core/device.cpp:423
#2 0x00007ffff53a3a58 in Pal::Linux::Device::EarlyInit (this=0x5555556219a0,
ipLevels=...)
at /home/loli/aur/amdvlk-git/src/pal/src/core/os/lnx/lnxDevice.cpp:617
#3 0x00007ffff53a323d in Pal::Linux::Device::Create (
pPlatform=pPlatform@entry=0x555555615f38,
pSettingsPath=pSettingsPath@entry=0x555555616094 "/etc/amd",
pBusId=pBusId@entry=0x7fffffffd4d0 "pci:0000:02:00.0",
pPrimaryNode=0x555555621768 "/dev/dri/card0",
pRenderNode=0x555555621798 "/dev/dri/renderD128", pciBusInfo=...,
deviceIndex=0, ppDeviceOut=0x7fffffffd448)
at /home/loli/aur/amdvlk-git/src/pal/src/core/os/lnx/lnxDevice.cpp:218
#4 0x00007ffff525c58b in Pal::Linux::Platform::ReQueryDevices (
this=0x555555615f38)
at /home/loli/aur/amdvlk-git/src/pal/src/core/os/lnx/lnxPlatform.cpp:201
#5 0x00007ffff5255409 in Pal::Platform::ReEnumerateDevices (
this=this@entry=0x555555615f38)
at /home/loli/aur/amdvlk-git/src/pal/src/core/platform.cpp:599
#6 0x00007ffff5255e0d in Pal::Platform::Init (this=0x555555615f38)
at /home/loli/aur/amdvlk-git/src/pal/src/core/platform.cpp:332
#7 0x00007ffff52550b5 in Pal::Platform::Create (createInfo=..., allocCb=...,
pPlacementAddr=<optimized out>, ppPlatform=ppPlatform@entry=0x7fffffffd6c0)
at /home/loli/aur/amdvlk-git/src/pal/src/core/platform.cpp:165
#8 0x00007ffff5253951 in Pal::CreatePlatform (createInfo=...,
pPlacementAddr=<optimized out>, pPlacementAddr@entry=0x555555614e70,
ppPlatform=ppPlatform@entry=0x555555607cd0)
at /home/loli/aur/amdvlk-git/src/pal/src/core/libInit.cpp:165
#9 0x00007ffff463d6f2 in vk::Instance::Init (this=this@entry=0x555555607cd0,
pAppInfo=pAppInfo@entry=0x7fffffffe420)
at /home/loli/aur/amdvlk-git/src/xgl/icd/api/vk_instance.cpp:315
#10 0x00007ffff463e195 in vk::Instance::Create (pCreateInfo=<optimized out>,
pAllocator=<optimized out>, pInstance=0x5555555a7708)
at /home/loli/aur/amdvlk-git/src/xgl/icd/api/vk_instance.cpp:198
if i run it without gdb, around 50% of the time it manages to print some info before segfaulting https://gist.github.com/b4618f0fa6d60b25af8f1306a8be9420
if i try to run vkquake (which works fine on the regular amdgpu driver) I get this segfault:
Starting program: /usr/bin/env VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/amd_icd64.json vkquake -basedir /home/loli/.steam/steam/steamapps/common/Quake
process 1398 is executing new program: /usr/bin/vkquake
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Command line: vkquake -basedir /home/loli/.steam/steam/steamapps/common/Quake
Found SDL version 2.0.8
Detected 8 CPUs.
Quake 1.09 (c) id Software
GLQuake 1.00 (c) id Software
FitzQuake 0.85 (c) John Fitzgibbons
FitzQuake SDL port (c) SleepwalkR, Baker
QuakeSpasm 0.93.0 (c) Ozkan Sezer, Eric Wasylishen & others
vkQuake 1.00.0 (c) Axel Gneiting & others
Host_Init
Playing registered version.
Console initialized.
UDP Initialized
Server using protocol 666 (FitzQuake)
Exe: 13:31:04 Sep 1 2018
256.0 megabyte heap
[New Thread 0x7fffd9b3d700 (LWP 1402)]
[New Thread 0x7fffd91fb700 (LWP 1403)]
[New Thread 0x7fffd89fa700 (LWP 1404)]
[New Thread 0x7fffcbfff700 (LWP 1405)]
[New Thread 0x7fffcb7fe700 (LWP 1406)]
[New Thread 0x7fffcaffd700 (LWP 1407)]
[New Thread 0x7fffca7fc700 (LWP 1408)]
[New Thread 0x7fffc9ffb700 (LWP 1409)]
[New Thread 0x7fffc97fa700 (LWP 1410)]
[New Thread 0x7fffc8ff9700 (LWP 1411)]
[New Thread 0x7fffabfff700 (LWP 1412)]
[New Thread 0x7fffab7fe700 (LWP 1413)]
[New Thread 0x7fffaaffd700 (LWP 1414)]
[Thread 0x7fffaaffd700 (LWP 1414) exited]
[New Thread 0x7fffaaffd700 (LWP 1415)]
[Thread 0x7fffaaffd700 (LWP 1415) exited]
Vulkan Initialization
AMD-PAL: Warn: Unconditional Alert | Reason: Unknown (/home/loli/aur/amdvlk-git/src/pal/inc/util/palMemTrackerImpl.h:76:~MemTracker)
AMD-PAL: Warn: ================ List of Leaked Blocks ================ (/home/loli/aur/amdvlk-git/src/pal/inc/util/palMemTrackerImpl.h:338:MemoryReport)
AMD-PAL: Warn: ClientMem = 0x0x555557b65930, AllocSize = 1424, MemBlkType = New, File = /home/loli/aur/amdvlk-git/src/pal/src/core/os/lnx/lnxVamMgr.cpp, LineNumber = 431, AllocNum = 1 (/home/loli/aur/amdvlk-git/src/pal/inc/util/palMemTrackerImpl.h:349:MemoryReport)
AMD-PAL: Warn: ================ End of List =========================== (/home/loli/aur/amdvlk-git/src/pal/inc/util/palMemTrackerImpl.h:352:MemoryReport)
AMD-PAL: Warn: Unconditional Alert | Reason: Unknown (/home/loli/aur/amdvlk-git/src/pal/src/core/os/lnx/lnxDevice.cpp:604:EarlyInit)
AMD-PAL: Warn: Unconditional Alert | Reason: Unknown (/home/loli/aur/amdvlk-git/src/pal/src/core/os/lnx/lnxDevice.cpp:604:EarlyInit)
Vendor: AMD
Device: AMD Radeon(TM) HD 8800 Series
Using VK_KHR_DEDICATED_ALLOCATION
Using A2B10G10R10 color buffer format
Using D32 depth buffer format
Creating command buffers
Thread 1 "vkquake" received signal SIGSEGV, Segmentation fault.
0x00007fff99dbe4e9 in Pal::ICmdBuffer::ICmdBuffer (this=<optimized out>)
at /home/loli/aur/amdvlk-git/src/pal/src/core/cmdBuffer.cpp:104
104 CmdBuffer::CmdBuffer(
(gdb) bt
#0 0x00007fff99dbe4e9 in Pal::ICmdBuffer::ICmdBuffer (this=<optimized out>)
at /home/loli/aur/amdvlk-git/src/pal/src/core/cmdBuffer.cpp:104
#1 Pal::CmdBuffer::CmdBuffer (this=0x55555b6ffcd0, device=..., createInfo=...)
at /home/loli/aur/amdvlk-git/src/pal/src/core/cmdBuffer.cpp:124
#2 0x00007fff99e1276b in Pal::GfxCmdBuffer::GfxCmdBuffer (
this=0x55555b6ffcd0, device=..., createInfo=...,
pPrefetchMgr=0x55555b7021c8)
at /home/loli/aur/amdvlk-git/src/pal/src/core/hw/gfxip/gfxDevice.h:481
#3 0x00007fff99e1dab1 in Pal::UniversalCmdBuffer::UniversalCmdBuffer (
this=0x55555b6ffcd0, device=..., createInfo=...,
pPrefetchMgr=<optimized out>, pDeCmdStream=<optimized out>,
pCeCmdStream=0x55555b702858, blendOptEnable=true)
at /home/loli/aur/amdvlk-git/src/pal/src/core/hw/gfxip/universalCmdBuffer.cpp:46
#4 0x00007fff99ce6619 in Pal::Gfx6::UniversalCmdBuffer::UniversalCmdBuffer (
this=0x55555b6ffcd0, device=..., createInfo=...)
at /home/loli/aur/amdvlk-git/src/pal/src/core/hw/gfxip/gfx6/gfx6SettingsLoader.h:52
#5 0x00007fff99cb4fb4 in Pal::Gfx6::Device::CreateCmdBuffer (
this=0x555557b63880, createInfo=..., pPlacementAddr=<optimized out>,
ppCmdBuffer=0x7fffffffe240)
at /home/loli/aur/amdvlk-git/src/pal/src/core/hw/gfxip/gfx6/gfx6Device.cpp:1253
#6 0x00007fff99dc9a0b in Pal::Device::ConstructCmdBuffer (
this=0x555557b5bec0, createInfo=..., pPlacementAddr=0x55555b6ffcd0,
ppCmdBuffer=ppCmdBuffer@entry=0x7fffffffe2c0)
at /home/loli/aur/amdvlk-git/src/pal/src/core/device.cpp:2447
#7 0x00007fff99dc9b20 in Pal::Device::CreateCmdBuffer (this=<optimized out>,
createInfo=..., pPlacementAddr=<optimized out>, ppCmdBuffer=0x55555b6fda70)
at /home/loli/aur/amdvlk-git/src/pal/src/core/device.cpp:2478
#8 0x00007fff99041147 in vk::CmdBuffer::Initialize (
this=this@entry=0x55555b6fda48, pPalMem=pPalMem@entry=0x55555b6ffcd0,
pVbMem=pVbMem@entry=0x55555b7031c0, createInfo=...)
at /home/loli/aur/amdvlk-git/src/pal/inc/util/palInlineFuncs.h:82
#9 0x00007fff9904890a in vk::CmdBuffer::Create (pDevice=0x5555582a5878,
pAllocateInfo=<optimized out>, pCommandBuffers=0x5555556446e0)
at /home/loli/aur/amdvlk-git/src/xgl/icd/api/vk_cmdbuffer.cpp:483
#10 0x00007ffff7df7b95 in vkAllocateCommandBuffers ()
from /usr/lib/libvulkan.so.1
I think I have exactly the same problem with another SI card: Radeon 7970 (Tahiti). Radv is working fine. All vulkan apps cause a segmentation fault.
Arch Linux. LLVM 8.0.0 svn Mesa 18.3 git Linux-Firmware git amdvlk-git
4.19 has been released on the Arch repos. I just tested with 4.19.1 and am not experiencing segfaults anymore (previously: https://github.com/mpv-player/mpv/issues/6084; also occured with minimal testing apps but didn't have debug symbols).
My GPU is a 7950 Boost (tahiti).
@FichteFoll thanks for the update.
vulkaninfo still seems to be crashing for me, but with a different stacktrace. it's a general protection fault in strstr
==========
VULKANINFO
==========
Vulkan Instance Version: 1.1.85
==12369==
==12369== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==12369== General Protection Fault
==12369== at 0x78E17E1: strstr (string.h:324)
==12369== by 0x78E17E1: Pal::Linux::Device::Create(Pal::Linux::Platform*, char const*, char const*, char const*, char const*, _drmPciBusInfo const&, unsigned int, Pal::Linux::Device**) (lnxDevice.cpp:202)
==12369== by 0x77944EA: Pal::Linux::Platform::ReQueryDevices() (lnxPlatform.cpp:202)
==12369== by 0x778C308: Pal::Platform::ReEnumerateDevices() (platform.cpp:640)
==12369== by 0x778CD02: Pal::Platform::Init() (platform.cpp:340)
==12369== by 0x778C789: Pal::Platform::Create(Pal::PlatformCreateInfo const&, Util::AllocCallbacks const&, void*, Pal::Platform**) (platform.cpp:169)
==12369== by 0x778A800: Pal::CreatePlatform(Pal::PlatformCreateInfo const&, void*, Pal::IPlatform**) (libInit.cpp:165)
==12369== by 0x6B0C6E1: vk::Instance::Init(VkApplicationInfo const*) (vk_instance.cpp:319)
==12369== by 0x6B0D18F: vk::Instance::Create(VkInstanceCreateInfo const*, VkAllocationCallbacks const*, VkInstance_T**) (vk_instance.cpp:201)
==12369== by 0x48BDADD: ??? (in /usr/lib/libvulkan.so.1.1.85)
==12369== by 0x48C16D8: ??? (in /usr/lib/libvulkan.so.1.1.85)
==12369== by 0x48C57CD: vkCreateInstance (in /usr/lib/libvulkan.so.1.1.85)
==12369== by 0x10A521: ??? (in /usr/bin/vulkaninfo)
vkquake is also still crashing with what seems to be the same error as before
Using VK_KHR_DEDICATED_ALLOCATION
Using A2B10G10R10 color buffer format
Using D32 depth buffer format
Creating command buffers
==12483==
==12483== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==12483== General Protection Fault
==12483== at 0x28963DC9: ICmdBuffer (palCmdBuffer.h:3114)
==12483== by 0x28963DC9: Pal::CmdBuffer::CmdBuffer(Pal::Device const&, Pal::CmdBufferCreateInfo const&) (cmdBuffer.cpp:130)
==12483== by 0x289BFAAA: Pal::GfxCmdBuffer::GfxCmdBuffer(Pal::GfxDevice const&, Pal::CmdBufferCreateInfo const&, Pal::PrefetchMgr*) (gfxCmdBuffer.cpp:69)
==12483== by 0x289C9C70: Pal::UniversalCmdBuffer::UniversalCmdBuffer(Pal::GfxDevice const&, Pal::CmdBufferCreateInfo const&, Pal::PrefetchMgr*, Pal::GfxCmdStream*, Pal::GfxCmdStream*, bool) (universalCmdBuffer.cpp:60)
==12483== by 0x28885668: Pal::Gfx6::UniversalCmdBuffer::UniversalCmdBuffer(Pal::Gfx6::Device const&, Pal::CmdBufferCreateInfo const&) (gfx6UniversalCmdBuffer.cpp:222)
==12483== by 0x288551E3: Pal::Gfx6::Device::CreateCmdBuffer(Pal::CmdBufferCreateInfo const&, void*, Pal::CmdBuffer**) (gfx6Device.cpp:1262)
==12483== by 0x2896F92A: Pal::Device::ConstructCmdBuffer(Pal::CmdBufferCreateInfo const&, void*, Pal::CmdBuffer**) const (device.cpp:2360)
==12483== by 0x2896FA3F: Pal::Device::CreateCmdBuffer(Pal::CmdBufferCreateInfo const&, void*, Pal::ICmdBuffer**) (device.cpp:2391)
==12483== by 0x27B74F2D: vk::CmdBuffer::Initialize(void*, void*, Pal::CmdBufferCreateInfo const&) (vk_cmdbuffer.cpp:534)
==12483== by 0x27B7C5A1: vk::CmdBuffer::Create(vk::Device*, VkCommandBufferAllocateInfo const*, VkCommandBuffer_T**) (vk_cmdbuffer.cpp:483)
==12483== by 0x4A1F554: vkAllocateCommandBuffers (in /usr/lib/libvulkan.so.1.1.85)
==12483== by 0x1284A5: ??? (in /usr/bin/vkquake)
==12483== by 0x16D6AE: ??? (in /usr/bin/vkquake)
mpv is crashing too
AMD-PAL: Warn: Unconditional Alert | Reason: Unknown (/home/loli/aur/amdvlk-git/src/pal/inc/util/palMemTrackerImpl.h:76:~MemTracker)
AMD-PAL: Warn: ================ List of Leaked Blocks ================ (/home/loli/aur/amdvlk-git/src/pal/inc/util/palMemTrackerImpl.h:338:MemoryReport)
AMD-PAL: Warn: ClientMem = 0x0x1a46f400, AllocSize = 1424, MemBlkType = New, File = /home/loli/aur/amdvlk-git/src/pal/src/core/os/lnx/lnxVamMgr.cpp, LineNumber = 440, AllocNum = 1 (/home/loli/aur/amdvlk-git/src/pal/inc/util/palMemTrackerImpl.h:349:MemoryReport)
AMD-PAL: Warn: ================ End of List =========================== (/home/loli/aur/amdvlk-git/src/pal/inc/util/palMemTrackerImpl.h:352:MemoryReport)
AMD-PAL: Warn: Unconditional Alert | Reason: Unknown (/home/loli/aur/amdvlk-git/src/pal/src/core/os/lnx/lnxDevice.cpp:617:EarlyInit)
AMD-PAL: Warn: Unconditional Alert | Reason: Unknown (/home/loli/aur/amdvlk-git/src/pal/src/core/os/lnx/lnxDevice.cpp:617:EarlyInit)
==12666==
==12666== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==12666== General Protection Fault
==12666== at 0x1C4C3D17: Pal::CmdAllocator::FreeAllChunks() (cmdAllocator.cpp:264)
==12666== by 0x1C4C4294: Pal::CmdAllocator::~CmdAllocator() (cmdAllocator.cpp:207)
==12666== by 0x1C4C3C59: Destroy (cmdAllocator.h:59)
==12666== by 0x1C4C3C59: Pal::CmdAllocator::DestroyInternal() (cmdAllocator.cpp:383)
==12666== by 0x1C3BC01B: Pal::Device::Cleanup() (device.cpp:349)
==12666== by 0x1C3E7F07: Pal::Linux::Device::Cleanup() (lnxDevice.cpp:430)
==12666== by 0x1C292271: Pal::Platform::TearDownDevices() (platform.cpp:302)
==12666== by 0x1C299E0C: Pal::Linux::Platform::Destroy() (lnxPlatform.cpp:73)
==12666== by 0x1C38E7BD: Destroy (decorators.h:239)
==12666== by 0x1C38E7BD: Pal::InterfaceLogger::Platform::Destroy() (interfaceLoggerPlatform.cpp:806)
==12666== by 0x1B61403F: vk::Instance::Destroy() (vk_instance.cpp:586)
==12666== by 0x7A26146: ??? (in /usr/lib/libvulkan.so.1.1.85)
==12666== by 0x7A2FC90: vkDestroyInstance (in /usr/lib/libvulkan.so.1.1.85)
==12666== by 0x23A446: ??? (in /usr/bin/mpv)
tested on 4.19.2-arch1-1-ARCH
, 4.19.1-zen2-2-zen
and 4.20.0-rc2-mainline
, same result on all these kernels
commits:
i tried debugging this a bit and it seems that the PAL_MALLOC_BASE call in Device::Create does something bad because trying to do anything in the pMemory != nullptr
block (even just printing a log message) results in a general protection fault
The crash still happen also with my 7970. Just checked out current Master.
I'm tried to reproduce your issue, but failed on my platform, same version as yours: testing application: vulkaninfo and cube cards: Polaris10 and Vega10 ubuntu18.04 kernel: original ubuntu 18.04 kernel 4.15.0 and kernel 4.18.5 llpc: e4edfd4ff45eed666825494c966955963e908ae2 llvm: 678b8d52b91af51de5839f44144701432df30a00 pal: 2e0f13d76846e9623cd84141ab30aaa28560a348 xgl: 4730177e34e414e233cddfbe923ef64b7aac5f83
Also, I tried both the latest master and dev, everything works well.
I'm not sure what problem happens on yours, seems amdvlk has no problem. In addition to that, our Jenkins automation runs master as well.
Ah, I reproduced your issue when I use Tahiti card. As 2nd floor Zakhrov mentioned, it does be a firmware updating problem. You not only need to update kernel to 4.19, but also need copy newest firmware to /lib/firmware/radeon. Following the below method, you can get newest firmware from amdgpu-pro driver:
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/amdgpu You can also get the right firmware from here, copy them to /lib/firmware/amdgpu/, and in the meantime, you should use kernel 4.19 as well.
Cheers.
@Francesco149 , How about the result after you updating firmware? anything I can help?
i was already using linux-firmware-git, but i copied the firmware from that deb and rebooted and it doesn't seem to help either
did you run "update-initramfs -u" after copying the firmware?
hm i re-ran the mkinitcpio -p linux
just to make sure and reboot but same result
is there any way to check for sure if i'm running the correct firmware?
yes, you can "sudo cat /sys/kernel/debug/dri/0/amdgpu_firmware_info", ME feature version should be >= 25, otherwise, amdvlk will think it's an unknown card.
If there is no this debug file, then I guess you don't load amdgpu kernel driver. Then, could you confirm which driver is loaded in kernel? did you add radeon to blacklist? ('blacklist radeon' to end of /etc/modprobe.d/blacklist.conf)
it does look like the correct firmware
$ sudo cat /sys/kernel/debug/dri/0/amdgpu_firmware_info
VCE feature version: 0, firmware version: 0x00000000
UVD feature version: 0, firmware version: 0x00000000
MC feature version: 0, firmware version: 0x00a47500
ME feature version: 29, firmware version: 0x00000091
PFP feature version: 29, firmware version: 0x00000054
CE feature version: 29, firmware version: 0x0000003d
RLC feature version: 1, firmware version: 0x00000007
RLC SRLC feature version: 0, firmware version: 0x00000000
RLC SRLG feature version: 0, firmware version: 0x00000000
RLC SRLS feature version: 0, firmware version: 0x00000000
MEC feature version: 0, firmware version: 0x00000000
SOS feature version: 0, firmware version: 0x00000000
ASD feature version: 0, firmware version: 0x00000000
SMC feature version: 0, firmware version: 0x10020000
SDMA0 feature version: 0, firmware version: 0x00000000
SDMA1 feature version: 0, firmware version: 0x00000000
VCN feature version: 0, firmware version: 0x00000000
VBIOS version: 113-1E27100-O48
and yea I'm sure i'm running amdgpu because I've been playing vulkan games on open-source amdgpu driver (that wouldn't work on the regular radeon driver)
Yeah, your firmware is correct. Can you paste your dmesg?
$ dmesg | grep amdgpu
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-linux-zen root=UUID=86d4f775-4e78-4a67-8017-d58293bc5e3d rw quiet radeon.si_support=1 radeon.cik_support=0 amdgpu.si_support=1 amdgpu.cik_support=1 amdgpu.gpu_recovery=1
[ 0.121632] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-linux-zen root=UUID=86d4f775-4e78-4a67-8017-d58293bc5e3d rw quiet radeon.si_support=1 radeon.cik_support=0 amdgpu.si_support=1 amdgpu.cik_support=1 amdgpu.gpu_recovery=1
[ 1.423156] [drm] amdgpu kernel modesetting enabled.
[ 1.423724] fb: switching to amdgpudrmfb from EFI VGA
[ 1.431023] amdgpu 0000:02:00.0: VRAM: 2048M 0x000000F400000000 - 0x000000F47FFFFFFF (2048M used)
[ 1.431024] amdgpu 0000:02:00.0: GART: 1024M 0x0000000000000000 - 0x000000003FFFFFFF
[ 1.431106] [drm] amdgpu: 2048M of VRAM memory ready
[ 1.431107] [drm] amdgpu: 3072M of GTT memory ready.
[ 1.431725] amdgpu 0000:02:00.0: PCIE GART of 1024M enabled (table at 0x000000F400300000).
[ 1.431816] [drm] amdgpu: dpm initialized
[ 1.698571] fbcon: amdgpudrmfb (fb0) is primary device
[ 1.880876] amdgpu 0000:02:00.0: fb0: amdgpudrmfb frame buffer device
[ 2.189511] [drm] Initialized amdgpu 3.27.0 20150101 for 0000:02:00.0 on minor 0
$ dmesg | grep radeon
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-linux-zen root=UUID=86d4f775-4e78-4a67-8017-d58293bc5e3d rw quiet radeon.si_support=1 radeon.cik_support=0 amdgpu.si_support=1 amdgpu.cik_support=1 amdgpu.gpu_recovery=1
[ 0.121632] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-linux-zen root=UUID=86d4f775-4e78-4a67-8017-d58293bc5e3d rw quiet radeon.si_support=1 radeon.cik_support=0 amdgpu.si_support=1 amdgpu.cik_support=1 amdgpu.gpu_recovery=1
[ 2.207876] [drm] radeon kernel modesetting enabled.
hmm maybe i should try blacklisting radeon entirely after all? i see it's enabling kernel modesetting for both radeon and amdgpu
In fact, I want to see all dmesg but not grepped. And yes, please add radeon to blacklist.
btw, everything works well on my Tahiti card.
I dont think it has something to do with the firmware. I just updated firmware-git and removed every tahiti related from /lib/firmware/radeon. Still no go.
Please wrap long code segments like this in <details></details>
. You may need to surround those with blank lines to allow GitHub to parse ```
as code blocks.
I tried now some things but still no success. I downloaded amd 18.4 drivers and extracted the firmware into /lib/firmware/radeon again. I recognized, that the 18.4 has no tahiti firmware under amdgpu just under radeon, I wonder if amdvlk just looks for firmware files under /radeon. In my /lib/firmware/radeon folder, I saw kind of a mess. I had lot of files, TAHITI and tahiti named files. TAHITI_ files looked like really old ones and I deleted them and copied new ones over from the driver package but that didnt solved the problem.
Thread 1 "vkquake" received signal SIGSEGV, Segmentation fault. 0x00007fff9dede440 in ?? () from /usr/lib/amdvlk64.so (gdb) bt
0 0x00007fff9dede440 in ?? () from /usr/lib/amdvlk64.so
1 0x00007fff9deea0cc in ?? () from /usr/lib/amdvlk64.so
2 0x00007fff9dceee78 in ?? () from /usr/lib/amdvlk64.so
3 0x00007fff9dee3ad8 in ?? () from /usr/lib/amdvlk64.so
4 0x00007ffff7dd04fe in ?? () from /usr/lib/libvulkan.so.1
5 0x00007ffff7dd43e9 in ?? () from /usr/lib/libvulkan.so.1
6 0x00007ffff7dd851e in vkCreateInstance () from /usr/lib/libvulkan.so.1
7 0x000055555557490e in ?? ()
8 0x00005555555b9274 in ?? ()
9 0x00005555555612fb in main ()>
I see that a lot of arch users have similar problems, so I wonder if it is an arch specific problem. I currently recompile the package without march=native and O1 lets see if it works this time...
@random2324 I recommend building the driver with debug info so you can get useful stack traces when you crash. if you're using the arch aur package you can do that by changing all Release64 to Debug64 and all Release to Debug in the PKGBUILD, then temporarily remove "strip" from options in /etc/makepkg.conf before running makepkg
also ill post my full dmesg next reboot, sorry for being so slow
It turned out to be really a compiler issue. I compiled without march=native (in my case haswell) and it works. So tested some other archs and westmere is the last option that works. Sandybridge also segfaults. So the issue seems to be avx.
very interesting, i'll try that later, thanks for testing
I'm on haswell as well by the way so it might be something specific to this arch
My last test on this and its really because of avx. My cflags are now: CFLAGS="-O2 -pipe -march=native -mno-avx -fstack-protector-strong -fno-plt"
I guess AMD didnt intended to build this with march=native anyway. Wondering if AMD will fix this.
Thanks @random2324 .
Is this option ( -march=native) added by yourself? right? I don't find that option by grep.
Could you please make a patch for this compiling issue (-mno-avx) and send to review?
yep it works great with -mno-avx, nice find @random2324
FWIW, I'm running Haswell as well and just building amdvlk-git
from the AUR without changes (except when I disabled stripping to debug the segfault).
Thanks @random2324 .
Is this option ( -march=native) added by yourself? right? I don't find that option by grep.
-march=native wont be probably used by many people. This can be read on arch wiki page https://wiki.archlinux.org/index.php/Makepkg#Building_optimized_binaries But its not the default. The arch aur PKGBUILD disables also other flags: https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=amdvlk-git
Maybe AMD could fix this?
Could you please make a patch for this compiling issue (-mno-avx) and send to review?
Well it seems a little bit unclear why this happens for some users and why it doesnt for others. This needs to be investigated.
@FichteFoll Maybe its because of GCC? I use GCC 8.2.1 20181127 I havent tested others.
@FichteFoll is probably not using -march=native
Yeah, I'm using -march=x86-64 -mtune=generic
, the default. I missed that changing to -march=native
was important here.
If you want me to try building with -march=native
and then also with -mno-avx
, I can do so, since I seem to have the same hardware generations.
Out of curious, how did you set your CFLAGS/CXX_FLAGS? @random2324
I failed to reproduce your issue today with "cmake -H. -Bbuilds/dbg64 -DCMAKE_C_FLAGS=-march=native -DCMAKE_CXX_FLAGS=-march=native", is that enough?
this is what I had:
-march=native -mtune=native -O3 -pipe -fstack-protector-strong -fno-plt
you can try specifically enabling avx with something like -mavx to trigger the issue
The issue is still there with latest code drop.
Out of curious, how did you set your CFLAGS/CXX_FLAGS? @random2324
I cant really help you here. I use Arch Linux and its build system usually does the job of setting the cflags for you.
However there is some discussion going on here: https://stackoverflow.com/questions/10085945/set-cflags-and-cxxflags-options-using-cmake Maybe that helps.
I suggest to add -mno-avx to CFLAGS/CXX_FLAGS first until we fix it.
Hi! Is it fixed now?
@RarogCmex
Hi! Is it fixed now?
The results from my current debugging efforts say no.
Please help create a new issue if anyone still sees the issue
This issue is at last fixed in 2021.Q3.1. Compiling with avx and avx2 support is working now.
the kernel has
amdgpu.si_support=1 amdgpu.cik_support=1
and the regular amdgpu driver works fine with vulkanI have verified that my linux-firmware is up to date (20180825)
backtrace of vulkaninfo with amdvlk built in debug mode