amshafer / nvidia-driver

Fork of the Nvidia FreeBSD driver to port the nvidia-drm.ko module from Linux
43 stars 5 forks source link

Guidance for using Prime with Xorg #13

Closed NorwegianRockCat closed 1 year ago

NorwegianRockCat commented 1 year ago

Hi,

Thanks for your work here. I've finally had a chance to try this out with 13.2-RC6 on a Thinkpad T560 with Skylake Graphics and Skylake graphics. I also rebuild Xorg with your DRM patch.

I have tried to follow instructions for using Prime, but I can't find anything definitive and bounce around lots of web search results. I tried generating an xorg config with: nvidia-xconfig -prime. Here the X server starts, but the screen is blank, but I cannot see any errors in the log file. If I change the ordering (that is, put the Intel modesetting before nvidia), I get an image on the screen.

With this second configuration, it seems that PRIME sort of works. For example, if I run vkcube-xcb, it shows an image and writes: Selected GPU 0: NVIDIA GeForce 940MX, type: DiscreteGpu` on the terminal.

I also get glxinfo from the nvidia card if I set the appropriate environment variables.

If I try and run xrandr --listproviders, I don't get nvidia listed:

xrandr --listproviders 
Providers: number : 1
Provider 0: id: 0x46 cap: 0xf, Source Output, Sink Output, Source Offload, Sink Offload crtcs: 3 outputs: 5 associated providers: 0 name:modesetting

So, I look at the Xorg.0.log and I see that it is loaded and unloaded:

[  7046.333] (II) LoadModule: "nvidia"
[  7046.333] (II) Loading /usr/local/lib/xorg/modules/drivers/nvidia_drv.so
[  7046.333] (II) Module nvidia: vendor="NVIDIA Corporation"
[  7046.333]    compiled for 1.6.99.901, module version = 1.0.0
[  7046.333]    Module class: X.Org Video Driver
[  7046.333] (II) modesetting: Driver for Modesetting Kernel Drivers: kms
[  7046.333] (II) NVIDIA dlloader X Driver  530.30.02  Wed Feb 22 03:40:15 UTC 2023
[  7046.333] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
...
[  7046.357] (II) Loading sub module "wfb"
[  7046.357] (II) LoadModule: "wfb"
[  7046.357] (II) Loading /usr/local/lib/xorg/modules/libwfb.so
[  7046.358] (II) Module wfb: vendor="X.Org Foundation"
[  7046.358]    compiled for 1.21.1.8, module version = 1.0.0
[  7046.358]    ABI class: X.Org ANSI C Emulation, version 0.4
[  7046.359] (WW) VGA arbiter: cannot open kernel arbiter, no multi-card support
[  7046.359] (EE) Screen 1 deleted because of no matching config section.
[  7046.359] (II) UnloadModule: "nvidia"
[  7046.359] (II) UnloadSubModule: "wfb"

Is there something that I'm missing here? I can provide more information, I was unsure how much I should provide in an initial list.

FWIW, I did use both cards on this machine with Bumblebee/Prime on Ubuntu 16.04, so things at least worked back then. I suspect I'm missing something.

amshafer commented 1 year ago

Sorry for the delay, I've got a writeup for you about configuring PRIME:

https://badland.io/prime-configuration.md

Can you give the auto-configuration strategy I outlined there a try? I think it should simplify things for you. That post is mostly polished up but if you find anything you'd like me to clarify just let me know.

If the screen is black after configuring you're probably missing the xrandr –setprovideroutputsource modesetting NVIDIA-0 step I mention in the post.

NorwegianRockCat commented 1 year ago

Thank you for the write up! I appreciate that you also provide information about how PRIME should work and the difference between auto-configuration and the manual configuration. It seems I was actually doing a mix of both, which was confusing and wasn't helpful.

Unfortunately, the autoconfiguration that you specified didn't seem to work for me (yet). I'll outline what I did and maybe we can spot the problem. I apologize in advance for the length, but would rather put too much than not enough.

I cleared out /usr/local/etc/X11 and added the two files suggested from your documentation (for intel and nvidia):

# xorg.conf.d/20-intel.conf 
Section "OutputClass"
    Identifier "intel"
    MatchDriver "i915"
    Driver "modesetting"
    Option "PrimaryGPU" "yes"
EndSection

# xorg.conf.d/20-nvidia-drm-outputclass.conf 

Section "OutputClass"
    Identifier "nvidia"
    MatchDriver "nvidia-drm"
    Driver "nvidia"
EndSection

Running with this configuration and looking at /var/log/Xorg.0.log gets me:

[  4348.038] (==) Matched intel as autoconfigured driver 0
[  4348.038] (==) Matched modesetting as autoconfigured driver 1
[  4348.038] (==) Matched scfb as autoconfigured driver 2
[  4348.038] (==) Matched vesa as autoconfigured driver 3

So, it matches the intel driver and the NVidia driver is /never/ loaded.

If I comment out the PrimaryGPU option on Intel and switch it on in to the nvidia output class. I get the same result, the nvidia driver is never loaded.

If I add a BusID to the Outputclass, gets me the same result. If I add

Section "Device"
    Identifier     "nvidia"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:6:0:0"
EndSection

It also does nothing. At the same time xrandr only gives me:

Providers: number : 1
Provider 0: id: 0x46 cap: 0xf, Source Output, Sink Output, Source Offload, Sink Offload crtcs: 3 outputs: 5 associated providers: 0 name:modesetting

I remember back in 2017 when I was trying trueos with this exact machine that I needed to specify the BusID for the Nvidia driver to load in X, but I never went further because I had a blank screen back then too (likely as you point out in your write-up that the laptop screen is wired to the iGPU).

If I get rid of the autoconfigured items and instead go with a manual setup based on nvidia-xconfig, things are more what I talked about last time, X loads both the modesetting and nvidia driver, but unloads the one that isn't put in the screen section.

That is:

Given my xorg.conf that says:

# nvidia-xconfig: X configuration file generated by nvidia-xconfig
# nvidia-xconfig:  version 530.30.02

Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0"
    Inactive       "dGPU"
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
EndSection

Section "Files"
EndSection

Section "Module"
    Load           "dbe"
    Load           "extmod"
    Load           "type1"
    Load           "freetype"
    Load           "glx"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Mouse0"
    Driver         "mouse"
    Option         "Protocol" "auto"
    Option         "Device" "/dev/sysmouse"
    Option         "Emulate3Buttons" "no"
    Option         "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Keyboard0"
    Driver         "keyboard"
EndSection

Section "Monitor"
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "Unknown"
    Option         "DPMS"
EndSection

Section "Device"
    Identifier     "dGPU"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:6:0:0"
EndSection

Section "Device"
    Identifier     "iGPU"
    Driver         "modesetting"
    VendorName     "Intel"
    BusID          "PCI:0:2:0"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "iGPU"
    Monitor        "Monitor0"
    DefaultDepth    24
    Option         "AllowEmptyInitialConfiguration" "True"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

The Xorg.0.log will include these lines for nvidia:

[  5811.545] (II) LoadModule: "nvidia"
[  5811.545] (II) Loading /usr/local/lib/xorg/modules/drivers/nvidia_drv.so
[  5811.545] (II) Module nvidia: vendor="NVIDIA Corporation"
[  5811.545]    compiled for 1.6.99.901, module version = 1.0.0
[  5811.545]    Module class: X.Org Video Driver
[  5811.545] (II) modesetting: Driver for Modesetting Kernel Drivers: kms
[  5811.545] (II) NVIDIA dlloader X Driver  530.30.02  Wed Feb 22 03:40:15 UTC 2023
[  5811.545] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
...
[  5811.564] (WW) VGA arbiter: cannot open kernel arbiter, no multi-card support
[  5811.564] (EE) Screen 1 deleted because of no matching config section.
[  5811.564] (II) UnloadModule: "nvidia"
...

and xrandr --listproviders looks as it did above:

Providers: number : 1
Provider 0: id: 0x46 cap: 0xf, Source Output, Sink Output, Source Offload, Sink Offload crtcs: 3 outputs: 5 associated providers: 0 name:modesetting

If I instead change the nvidia (dGPU) to have a screen and iGPU to be inactive we get a blank screen (that is, I get a cursor and the moused mouse pointer) with the following in the log:

[  6185.174] (II) LoadModule: "nvidia"
[  6185.174] (II) Loading /usr/local/lib/xorg/modules/drivers/nvidia_drv.so
[  6185.175] (II) Module nvidia: vendor="NVIDIA Corporation"
[  6185.175]    compiled for 1.6.99.901, module version = 1.0.0
[  6185.175] (II) LoadModule: "modesetting"
[  6185.175] (II) Loading /usr/local/lib/xorg/modules/drivers/modesetting_drv.so
[  6185.176] (II) Module modesetting: vendor="X.Org Foundation"
[  6185.176]    compiled for 1.21.1.8, module version = 1.21.1
[  6185.176]    Module class: X.Org Video Driver
[  6185.176]    ABI class: X.Org Video Driver, version 25.2
[  6185.176] (II) NVIDIA dlloader X Driver  530.30.02  Wed Feb 22 03:40:15 UTC 2023
[  6185.176] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
[  6185.177] (II) modesetting: Driver for Modesetting Kernel Drivers: kms
[  6185.197] (**) modeset(1): claimed PCI slot 0@0:2:0
[  6185.197] (II) modeset(1): using default device
[  6185.197] (WW) VGA arbiter: cannot open kernel arbiter, no multi-card support
[  6185.197] (EE) Screen 1 deleted because of no matching config section.
[  6185.197] (II) UnloadModule: "modesetting"

and xrandr -listproviders says:

Providers: number : 1
Provider 0: id: 0x1b7 cap: 0x1, Source Output crtcs: 0 outputs: 0 associated providers: 0 name:NVIDIA-0

which probably makes sense why my the screen is "blank"

So, it seems with the autoconfiguration, I can't load the nvidia driver, but with fully specified xorg.conf, I can only activate one of the video cards due to how the Screen section is specified.

I'm not quite sure what needs to be done here. My thoughts are:

  1. Figure out if there is some sort of manual configuration that gets delivers some sort of setting.
  2. Maybe the patched x11/xorg-server didn't compile what is needed (is there any additional files it should install or symbols in a binary I can check?)
  3. Build an xorg-server from main on Gitlab and see what happens (It was the 90s when I last did that to try glx with a Riva 128, so I guess it will bring back memories)
  4. Could there be something that is in -Current that isn't yet in 13.2-RELEASE?

I'll probably just create a boot environment and try #3 for now, but I'm happy for other advice.

Thanks again for the write-up, it did help me understand better how the OutputClass fits in with all the xorg.conf snippets I was seeing. Plus, it may actually work for people with slightly newer cards.

amshafer commented 1 year ago

Thanks for the details! I would remove your xorg.conf completely if you didn't already in the first things you outlined. iirc you can't half specify things manually there and then have the autoconfiguration pick up the second half. You have to not configure anything and let it do everything.

Also, you are missing the ModulePath entries in the OutputClass it looks like. I think you used the earlier example where I left them out to try to clarify things. Maybe try something like this:

Section "OutputClass"
    Identifier "nvidia"
    MatchDriver "nvidia-drm"
    Driver "nvidia"
    ModulePath "/usr/local/lib/nvidia/xorg"
    ModulePath "/usr/local/lib/xorg/modules"
EndSection

The other thing I would try is installing the X server from git, since that's what I've done. Just to rule out any issues applying the patches. I don't think there's anything in CURRENT that you're missing out on.

NorwegianRockCat commented 1 year ago

I would prefer autoconfigure instead of xorg.conf so, no problem removing xorg.conf.

Yes, I removed the module path because I didn't have a /usr/local/lib/nvidia/xorg am I missing something there? It appears that /usr/local/lib/xorg/modules is picked up without the extra ModulePath, so that is why I removed it.

Otherwise, I did clone the master branch and was able to build the Xserver. That did seem to produce a different warning.

[    20.272] (II) LoadModule: "glx"
[    20.273] (II) Loading /usr/local/lib/xorg/modules/extensions/libglx.so
[    20.289] (II) Module glx: vendor="X.Org Foundation"
[    20.289]    compiled for 1.21.1.99, module version = 1.0.0
[    20.289]    ABI class: X.Org Server Extension, version 10.0
[    20.289] (II) LoadModule: "nvidia"
[    20.289] (II) Loading /usr/local/lib/xorg/modules/drivers/nvidia_drv.so
[    20.298] (II) Module nvidia: vendor="NVIDIA Corporation"
[    20.298]    compiled for 1.6.99.901, module version = 1.0.0
[    20.298]    Module class: X.Org Video Driver
[    20.298] ================ WARNING WARNING WARNING WARNING ================
[    20.298] This server has a video driver ABI version of 26.1 that is not
supported by this NVIDIA driver.  Please check
http://www.nvidia.com/ for driver updates or downgrade to an X
server with a supported driver ABI.
[    20.298] =================================================================
[    20.298] (EE) NVIDIA: Use the -ignoreABI option to override this check.
[    20.298] (II) UnloadModule: "nvidia"
[    20.298] (II) Unloading nvidia
[    20.298] (EE) Failed to load module "nvidia" (module requirement mismatch, 0)
[    20.298] (EE) No drivers available.
[    20.298] (EE) 

I tried an X -ignoreABI, but things just crashed. I'm not sure what went wrong since it claims that the nvidia_drv was compiled for 1.6.99.901, while everything else is compiled for 1.21.1.99. I suspect some stale files somewhere. Should I recompile the nvidia-driver after the master Xserver was installed?

But... just in case, could you share the git commit hash you built your Xserver from and maybe your meson line? I tried to follow the port Makefile, but I might have missed something.

I feel the solution is close...

amshafer commented 1 year ago

Ah hm I forgot about the updated ABI issue. There's probably nothing wrong on your end, it's just that the latest X server code has new additions that your driver version doesn't handle yet. You can ignore it like you are but if it crashes there's probably a mismatch due to you ignoring it.

I went ahead and added a patch to the xorg-server port in the latest commit here, so you should be able to use that (or something like it) to build an older X server with my change. I assume you did this already since you said you built with my change. Please note I haven't tried this commit yet since I don't have time today, but I'll give it a go tomorrow and make sure it works so you and whoever runs into this next can just use that.

NorwegianRockCat commented 1 year ago

Yes, it was a segfault with -IgnoreABI and I didn't bother to write it down. Yes, your patch looks similar to what I did, but I'll try it with your patch instead. I believe I may just clean out all packages and try from scratch. I'll see if I have time this evening or later this week. I'll follow along here as well.

amshafer commented 1 year ago

Okay I can confirm that the following config works (tested on current but release should be fine). Installed xorg-server, nvidia-driver, and nvidia-drm-510-kmod from here: https://github.com/amshafer/freebsd-ports/

root@legion:~ # cat /usr/local/etc/X11/xorg.conf.d/10-nvidia-drm-outputclass.conf
Section "OutputClass"
    Identifier "nvidia"
    MatchDriver "nvidia-drm"
    Driver "nvidia"
    ModulePath "/usr/local/lib/nvidia/xorg"
    ModulePath "/usr/local/lib/xorg/modules"
EndSection

root@legion:~ # cat /usr/local/etc/X11/xorg.conf.d/20-amdgpu.conf                
Section "OutputClass"
    Identifier "AMD"
    MatchDriver "amdgpu"
    Driver "amdgpu"
    Option "PrimaryGPU" "yes"
EndSection

I think that should make things pretty easy to get going out of the box but let me know if you have issues. Also, if you happen to give this a go on CURRENT know that the top of tree is currently broken, I submitted a fix here that you'll need.

NorwegianRockCat commented 1 year ago

Thank you for the information. I'll try to find some time this evening to try this out and let you know how it works!

amshafer commented 1 year ago

Oh forgot to mention that I had force pushed that ports tree, so you should grab the latest from it and reinstall all the mentioned ports.

NorwegianRockCat commented 1 year ago

Well, I gave it a shot.

I added your ports tree as a remote, pulled your changes in, and checked out your main branch.

I built and installed x11-servers/xorg-server, x11/nvidia-driver, and graphics/nvidia-drm-510-kmod in that order. I choose the default configuration options for x11/nvidia-driver

I seem to still get similar results, which I guess is somewhat encouraging because it means that I wasn't doing too much wrong earlier :-) Regardless, let me document.

In all of these cases, but the nvidia blank screen, xrandr --listproviders will only list the modesetting provider.

So, I'm a bit at a loss for what is going wrong with doing the offload with xrandr or with "PrimaryGPU". But, I will say that the following things work:

I will admit that when I ran Ubuntu on this machine, this was similar to what I could do then. That is, I could run specific programs on the Nvidia GPU, but not have it arbitrarily "decide" who is doing rendering. I will admit that I /never/ tried the latter, so it may have worked.

Regardless, this set up has its uses (e.g., I can at least dump known GPU intensive tasks to the Nvidia GPU when I start them).

Could it be a quirk of the T560 or the Nvidia card (NVIDIA GeForce 940MX) that doesn't make the PrimaryGPU stuff not work?

If I had another more modern laptop, I would try to confirm, but alas, I only have laptops of Skylake vintage or older. I have a T420 with an Nvidia card as well, but that is too old (NVidia NVS 4200M).

I can try to get -current on the laptop if you think that will help, but it will probably have to wait some days as I imagine I would have to rebuild the xorg-server and kmod ports too.

Any other thoughts?

amshafer commented 1 year ago

Can you provide logs? Most importantly /var/log/messages and /var/log/Xor.0.log, assuming that's the right log file your X server used. Also the /usr/local/etc/X11/xorg.conf.d/ if that's changed. You could also do a full nvidia-bug-report.sh if you want.

amshafer commented 1 year ago

Also, I noticed a missing option in the Xorg build, and fixed a panic in nvidia-drm. So you'll want to refetch my ports tree and reinstall those. I think I may have reproduced your issue now so looking into that.

NorwegianRockCat commented 1 year ago

Ah, OK. I won't be in front of this laptop today, but I can get the logs tomorrow, if they are still interesting, and try out the new ports then. The missing option in Xorg seems like it could explain some of the issue.

amshafer commented 1 year ago

Okay I think I know what the problem is. I rather stupidly forgot that my fix for libudev-devd has not made it into the ports tree yet as there isn't an updated version for it, and that's the missing piece for it to autoconfigure devices properly. Can you try installing that?

I updated the PRIME guide to include that in the setup instructions.

amshafer commented 1 year ago

Note you can now grab libudev-devd version 0.5.1 from ports.

NorwegianRockCat commented 1 year ago

Yes! Building with the additional fix in x11/xorg-server from your ports tree and the new devel/libudev-devd commit did the trick. I went back to the autoconfigured items that I posted earlier and both were picked up and did the right thing. Now when I run xrandr --listproviders I get:

Providers: number : 2
Provider 0: id: 0x46 cap: 0xf, Source Output, Sink Output, Source Offload, Sink Offload crtcs: 3 outputs: 5 associated providers: 0 name:modesetting
Provider 1: id: 0x24c cap: 0x0 crtcs: 0 outputs: 0 associated providers: 0 name:NVIDIA-G0

Excellent!

Thank you for helping me out with this and providing good instructions. And of course, the legwork to bring this to FreeBSD. I'm looking forward to playing around a bit more with this.

One last thing, in your updated instructions, the -- in --setprovideroutputsource seems to have been converted to an en-dash (–), so you may want to fix that.

I'm closing this issue!