Open vishwin opened 4 months ago
One last little detail, what's the display setup for this look like? Just the laptop screen or is there an external monitor plugged in as well? When I try with an external monitor I hit the panic in #21, so I'm assuming you're not doing that.
Another thing to check would be that you have two cardN
entries in /dev/dri/
, but I'm assuming that's the case since it seems everything initializes correctly.
Ah nvm, reproduced
For whatever strange reason I can only reproduce this when I load nvidia-drm
before amdgpu
. Can you test and see if you see the same? Maybe by loading them manually just to verify, I don't know what order the rc.conf
variable loads things in.
fwiw if I load amdgpu and then nvidia-drm it works fine.
nvidia-drm
has always been loaded after i915kms
takes over the framebuffer from UEFI, as shown with the LinuxKPI I2C lines.
One thing you can check while I keep looking at this is the contents of /usr/local/share/X11/xorg.conf.d/20-nvidia-drm-outputclass.conf
and (if it exists) /usr/local/share/X11/xorg.conf.d/10-intel.conf
:
root@:~ # cat /usr/local/share/X11/xorg.conf.d/20-nvidia-drm-outputclass.conf
Section "OutputClass"
Identifier "nvidia"
MatchDriver "nvidia-drm"
Driver "nvidia"
Option "PrimaryGPU" "yes"
ModulePath "/usr/local/lib/nvidia/xorg"
ModulePath "/usr/local/lib/xorg/modules"
EndSection
root@:~ # cat /usr/local/share/X11/xorg.conf.d/10-intel.conf
Section "OutputClass"
Identifier "intel"
MatchDriver "i915"
Driver "modesetting"
Option "PrimaryGPU" "yes"
EndSection
This is a working config for me on my intel PRIME machine, I'm wondering if your setup switched when the .conf
files were overwritten during the latest package update and set the NVIDIA gpu as the primary. In that case you would see the black screen until you ran xrandr --auto
. Note that if you do that right now or use an external monitor you'll still hit the panic I'm looking into.
You should be able to force Intel as the primary by ensuring Option "PrimaryGPU" "yes"
is in the intel.conf
, which you might have to create as iirc by default it isn't installed by a package. Hopefully that helps
I have all of the above in xorg.conf.d/
except for Option "PrimaryGPU" "yes"
under intel
and specifying the nvidia module paths. Leaving them out worked in 535.146.02. Don't have access to the machine for another couple days so will update when I get back.
Setting Option "PrimaryGPU" "yes"
under intel
allows X to continue bringing the displays/screens up, but this effectively becomes an Intel-only setup, as if the nvidia modules were never loaded. All rendering, GL providers, etc are done by intel via Mesa.
In 535.146.02, I never had to run any xrandr
command for the nvidia (headless) to handle rendering whilst intel handled display. On this version, when trying to execute the recommended xrandr
commands at any point, with nvidia as PrimaryGPU:
% xrandr --setprovideroutputsource modesetting NVIDIA-0
X Error of failed request: BadValue (integer parameter out of range for operation)
Major opcode of failed request: 140 (RANDR)
Minor opcode of failed request: 35 (RRSetProviderOutputSource)
Value in failed request: 0x217
Serial number of failed request: 16
Current serial number in output stream: 17
% xrandr --listproviders
Providers: number : 2
Provider 0: id: 0x217 cap: 0x0 crtcs: 0 outputs: 0 associated providers: 0 name:NVIDIA-0
Provider 1: id: 0x241 cap: 0xf, Source Output, Sink Output, Source Offload, Sink Offload crtcs: 3 outputs: 8 associated providers: 0 name:modesetting
Note that with intel as PrimaryGPU:
% xrandr --listproviders
Providers: number : 2
Provider 0: id: 0x49 cap: 0xf, Source Output, Sink Output, Source Offload, Sink Offload crtcs: 3 outputs: 8 associated providers: 0 name:modesetting
Provider 1: id: 0x2c7 cap: 0x0 crtcs: 0 outputs: 0 associated providers: 0 name:NVIDIA-G0
Does it work with NVIDIA as the primary GPU if you run with xrandr --auto
though? That's the missing bit for me, until I do that the laptop screen stays black. I don't know why that would suddenly be required again in 550, the logic for deciding this stuff in the X server can be wacky sometimes.
xrandr --auto
didn't do anything, so no.
Okay so that's different to what I've seen then. Out of curiosity in PrimaryGPU intel mode does running things on the NVIDIA GPU through the prime env variables work? i.e. something like:
$ __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia glxinfo | grep vendor
server glx vendor string: NVIDIA Corporation
client glx vendor string: NVIDIA Corporation
OpenGL vendor string: NVIDIA Corporation
Sorry for all the requests, since I don't reproduce exactly what you're seeing I'm just trying to figure out what's working.
glxinfo
with those environment variables worked. But of course I don't want to keep passing them.
There are issues with the prebuilt nvidia-drm pkg, is that what you are using? Or are you building from ports? If you're not building from ports can you give that a try?
related: https://reviews.freebsd.org/D44308
all only ever built from ports
fwiw adding __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia
to .xprofile
as a test to forcibly mimic the old behaviour results in generally unusable rendered results. Even alacritty (GPU-accelerated terminal) results in a black (unrendered) window.
D44308 allows X startup to continue and eventually return to the old behaviour from 535.146.02. However, rendering is a bit glitchy, occasionally showing the immediate previous frames, especially around the refresh rate such as watching high frame rate video or fast typing.
Some progress is good. What desktop env/etc is this with? Also what drm-kmod version are you using?
Latest -CURRENT so latest drm-61-kmod due to the API change. Desktop is Cinnamon, which I've been needing to update for time, especially recently as muffin has been sus.
I still haven't been able to reproduce any of the misrendering issues which is odd. I'll have to give Cinnamon a try.
Can you include the conftest results from 535 and 550 if possible? Just to check that nothing obvious went wrong with the compatibility detection. Something like cat work/NVIDIA.../(nvidia for 535)/src/nvidia-drm/conftest/*
should grab the function.h
, type.h
, etc that get generated during the build
Finally back on the target machine; latest upstream Cinnamon (not in ports yet) still rendering glitchy with occasional falls off the bus. More pronounced with a multiple-screen setup. Let me see if I can get the conftest
Wait so with 535 everything works fine (including no glitching) but with 550 it falls off the bus? That's very odd, usually falling off the bus is indicative of some kind of power issue? I'd double check that 535 doesn't also fall off the bus in order to confirm if there's a regression in 550.
Not to prematurely blame Cinnamon, but it would be interesting to see if your glitchy rendering happens on xfce4 as well. If xfce also shows the glitching and it doesn't happen with 535 I'd take that as confirmation that something is wrong with nvidia-drm.
535 does not suffer from glitchy rendering but also falls off the bus occasionally. However, the glitchiness isn't really noticeable on a single screen setup, like just the laptop display, but is certainly pronounced with multiple screens like my laptop display + external monitor.
The falling off the bus seem to trigger randomly mostly on pure GTK programs, particularly simpler dialog box or settings-type stuff, as if it is struggling to render something that shouldn't need much effort to draw. Specifically, I've had it happen with scrolling through a settings dialog, clicking a button that I can't release because the GPU falls of the bus right there, but also just rendering a PDF/image preview in the file manager a couple times. Could have to do with compositing? I'm dubious about power issues as the GPU itself is headless and not exactly replaceable, and these have all happened whilst plugged in.
Won't be able to properly test xfce until after returning from BSDCan and SELF mid-next month because the external monitor will not be available for those.
535 does not suffer from glitchy rendering
Seems like I need to test with Cinnamon then. I don't think I've ever tried that before, although last time I looked into this issue it was with XFCE and I didn't see glitching there.
but is certainly pronounced with multiple screens like my laptop display + external monitor.
What is the glitching like? Color corruption or tearing or something else? Normally I'd say something like this is an issue with the compositor but since it doesn't happen on 535 it sounds like something triggered by nvidia-drm.
The falling off the bus still seems unrelated, and like I said really is normally something to do with power. Even if it's plugged in I think it normally still goes through the battery which can go bad, but you might be able to disable the battery completely and then test if your laptop bios allows it.
Glitchiness not so much tearing (which I always expect), but rather to the effect of laggy refresh rate and momentary displays of previous frames. Most pronounced when viewing a 60 fps video on a 60 Hz refresh rate display.
I no longer have an internal battery so I disconnected the external battery, we'll see what happens.
Just experienced a falling off the bus without the battery.
Any ACPI or other power messages in dmesg before it falls off the bus?
The laggy frames does sound interesting, that could conceivably be explained by nvidia-drm. Last time I tried reproducing with simple programs, so I'll try with a fullscreen video.
Any ACPI or other power messages in dmesg before it falls off the bus?
never
hw.nvidiadrm.modeset=1
set in/boot/loader.conf
, kernel modules loaded from/etc/rc.conf
. X startup stalls with the screen off after libinput initialises the last pointing device. Machine is otherwise responsive and X is able to be zapped.550.54.14 Xorg.log
550.54.14 dmesg
535.146.02 dmesg (Xorg.0.log now missing :pensive:)