gnif / vendor-reset

Linux kernel vendor specific hardware reset module for sequences that are too complex/complicated to land in pci_quirks.c
GNU General Public License v2.0
749 stars 60 forks source link

RX 6800 #35

Closed TheCherry closed 3 years ago

TheCherry commented 3 years ago

Does that work for the RX 6800 too?

gnif commented 3 years ago

The rx6800 does not need a reset fix.

labor4 commented 3 years ago

The rx6800 does not need a reset fix.

i was wondering this about the whole new 6000 series, but could only guess from the lack of negative claims. in times of shortage this is an uncertain conclusion to make.

is there a reset-bug-reported matrix for gpus somewhere? if not, would this be a good idea?

gnif commented 3 years ago

The entire 6000 series are fine, this only affects Vega10, Vega20, and Navi10 GPUs. See: https://www.reddit.com/r/Amd/comments/jehkey/will_big_navi_support_function_level_reset_flr/gcqdtt2?utm_source=share&utm_medium=web2x&context=3

bobafetthotmail commented 3 years ago

@labor4 You can follow level1techs on youtube (or in the forums). Wendell is very involved in the KVM/passthrough/Linux scene, he did testing and reports on 6000 series cards a few months ago.

TheCherry commented 3 years ago

The rx6800 does not need a reset fix.

This is not true. At the moment I have to start always a bash script or reboot the host, after I shutdown my Windows VM. Else the VM can't attach the GPU and boot.

At the moment I use this script:

echo "1" | tee -a /sys/bus/pci/devices/0000\:2d\:00.1/remove
sleep 1
echo -n mem > /sys/power/state
sleep 1
echo "1" | tee -a /sys/bus/pci/rescan

But I have often to start it 2-3 times, before its work.

gnif commented 3 years ago

This is an issue with your system then, rx6800 does not need any patches to allow it to restart, if it did even your workaround would not work as the result of a failed reset is a completely dead card that requires a full reboot to restore it.

flamme-demon commented 3 years ago

hi,

For my part I am with a 6700 on xpc-ng and I confirm that after reboot there is an error 43

gnif commented 3 years ago

error 43

This is NOT the reset bug, the reset bug crashes the card out completely, making it fall off the PCIe BUS, breaking your system entirely.

flamme-demon commented 3 years ago

After shutting down or restarting the VM, the device is not usable at all until the physical machine is fully restarted or it is located. Windows displays it as error 43, but for me it is no longer acceccible.

gnif commented 3 years ago

Windows displays it as error 43

You can clearly still start a VM up with vfio-pci bound to the device even if non functional. I say again, this is NOT the reset bug, you have something else wrong with your system, perhaps a motherboard that isn't playing nice with VFIO/IOMMU, as is common.

The reset bug prevents the VM from even being able to start as vfio-pci tries and fails to reset the device resulting in a hung or completely broken PCIe device on the bus.