CUDA doesn't work on MSI boards

frantic-from-paranoiawf commented 1 week ago

Component

Dasharo firmware

Device

MSI Pro Z790-P

Dasharo version

Latest dasharo branch

Dasharo Tools Suite version

No response

Test case ID

No response

Brief summary

Nvidia CUDA with Torch does not work on MSI boards. It fails to initialize. Display does work though.

How reproducible

100% of the time.

How to reproduce

Install proprietary Nvidia drivers
Insert RTX card into top slot
Boot into Linux with display connected to the Nvidia card
Try and load a program that uses CUDA (e.g. oobabooga)

Expected behavior

Torch + CUDA program starts with no errors, can use CUDA for deep-learning successfully.

Actual behavior

Torch CUDA program returns the following error:

RuntimeError: CUDA driver initialization failed, you might not have a CUDA gpu

Screenshots

Additional context

Option ROMs loading in UEFI is enabled. AI program is oobabooga Using Gentoo Linux (all dependencies installed, contained, and ran inside of a Python venv, Gentoo isn't the cause) Display output and CUDA works perfectly and quickly on vendor firmware X11 is being used

Solutions you've tried

Loading Nvidia driver explicitly first in xorg.conf Disabling Option ROMs loading in firmware Recloning oobabooga and redownloading all dependencies Enabling resizeable bars inside of firmware

frantic-from-paranoiawf commented 1 week ago

Closing this as it has been solved. From what it seems (I don't know why), Kernel-open modules work on vendor firmware with CUDA, but you will need proprietary modules for CUDA on Dasharo. That might not be the case, but it's what fixed it for me.

miczyg1 commented 1 week ago

If CUDA doesn't work with open drivers it is still valid issue. From the coreboot matrix channel I saw that it work on any drivers with MSI firmware.

renehoj commented 1 week ago

I'm getting the same error using Qubes OS, this command solves the problem for me. sudo nvidia-smi --id 000:00:01.0 --persistence-mode 1

Dasharo / dasharo-issues