amshafer / nvidia-driver

Fork of the Nvidia FreeBSD driver to port the nvidia-drm.ko module from Linux
44 stars 5 forks source link

Problems with UI animations and mouse scroll/movement and others #9

Open iron-udjin opened 1 year ago

iron-udjin commented 1 year ago

Hello,

OS: 13.1-STABLE nvidia-driver: 525.60.11

1:

It's reproducable at least with chromium and telegram-desktop. When you call any action which require window animation (open new tab, click on a link, send/receive message, etc...), mouse pointer and screen scroll slowdowns. When animations are finished - all become work as expected.

According to changelog of chromium similar problem should be fixed in 109 release: https://bugs.chromium.org/p/chromium/issues/detail?id=1270089#c27 ...but it's not fixed yet and the same problem with telegram-desktop. I assume the problem could be in nvidia-drivers. With UHD Graphics 630 and i915kms everything works fine.

I tried to set #ozone-platform-hint=wayland in chrome://flags but it doesn't help.

2:

The programs which changing color temperature (redshift or wlsunset) don't work. For example, redshift freezes with 100% CPU consumption.

3:

nvidia-settings segfaults:

Program terminated with signal SIGSEGV, Segmentation fault.
Address not mapped to object.
#0  0x0000000000000001 in ?? ()
[Current thread is 1 (LWP 105644)]
(gdb) bt
#0  0x0000000000000001 in ?? ()
#1  0x000000082a21ac1a in ?? () from /usr/local/lib/libffi.so.8
#2  0x000000082a21a4e2 in ?? () from /usr/local/lib/libffi.so.8
#3  0x000000082a21a0dd in ffi_call () from /usr/local/lib/libffi.so.8
#4  0x0000000829f1eec0 in ?? () from /usr/local/lib/libwayland-client.so.0
#5  0x0000000829f1cec8 in ?? () from /usr/local/lib/libwayland-client.so.0
#6  0x0000000829f1c837 in wl_display_dispatch_queue_pending () from /usr/local/lib/libwayland-client.so.0
#7  0x0000000829f1c10e in wl_display_dispatch_queue () from /usr/local/lib/libwayland-client.so.0
#8  0x0000000829f1bd5b in wl_display_roundtrip_queue () from /usr/local/lib/libwayland-client.so.0
#9  0x000000082902cc9f in get_wayland_output_info () from /usr/local/lib/libnvidia-wayland-client.so.525.60.11
#10 0x000000000040ae6e in wconn_get_wayland_output_info ()
#11 0x0000000000408a58 in main ()

4:

multimedia/libva-vdpau-driver segfaults:

 $ vainfo 
Trying display: wayland
libva info: VA-API version 1.17.0
libva info: User environment variable requested driver 'vdpau'
libva info: Trying to open /usr/local/lib/dri/vdpau_drv_video.so
libva info: Found init function __vaDriverInit_1_17
Floating exception (core dumped)

Bug report: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=268657

My settings: ~/.tcshrc:

mkdir -p /tmp/`id -u`
setenv XDG_RUNTIME_DIR /tmp/`id -u`
setenv XDG_CACHE_HOME $HOME/.cache
setenv XDG_SESSION_TYPE wayland
setenv WLR_RENDERER vulkan

setenv MOZ_ENABLE_WAYLAND 1
setenv MOZ_DBUS_REMOTE 1
setenv GDK_BACKEND wayland
#setenv QT_QPA_PLATFORM wayland
setenv QT_QPA_PLATFORM wayland-egl
setenv QT_QPA_PLATFORMTHEME qt5ct
setenv QT_WAYLAND_FORCE_DPI physical
setenv QT_WAYLAND_DISABLE_WINDOWDECORATION 1
# nvidia
setenv WLR_NO_HARDWARE_CURSORS 1
setenv LIBVA_DRIVER_NAME vdpau
setenv GBM_BACKEND nvidia-drm
setenv __GLX_VENDOR_LIBRARY_NAME nvidia
amshafer commented 1 year ago

Thanks for the excellent report. How bad is the slowdown for you? I see some very slight slowdown but nothing crazy.

With regards to nvidia-settings crashing, that looks to be a libffi or libwayland problem with uninitialized memory:

==267== Syscall param sendmsg(sendmsg.msg_control) points to uninitialised byte(s)
==267==    at 0x4B2E89A: _sendmsg (in /lib/libc.so.7)                                               
==267==    by 0x4E42F75: ??? (in /lib/libthr.so.3)                       
==267==    by 0x4B2B499: sendmsg (in /lib/libc.so.7)   
==267==    by 0x5C9C0E3: wl_connection_flush (connection.c:313)
==267==    by 0x5C9B4EE: wl_display_flush (wayland-client.c:2154)
==267==    by 0x5C9ABFC: wl_display_dispatch_queue (wayland-client.c:1933)
==267==    by 0x6B900FB: ??? (in /usr/local/lib/libgdk-3.so.0.2404.30)
==267==    by 0x6B2E9F8: gdk_display_manager_open_display (in /usr/local/lib/libgdk-3.so.0.2404.30)
==267==    by 0x68C9912: gtk_init_check (in /usr/local/lib/libgtk-3.so.0.2404.30) 
==267==    by 0x5F133C0: ctk_init_check (ctkui.c:32)                      
==267==    by 0x40B519: ????                     
==267==  Address 0x7fc0006bc is on thread 1's stack                       
==267==  in frame #3, created by wl_connection_flush (connection.c:290)      
==267==  Uninitialised value was created by a stack allocation
==267==    at 0x5C9BECD: wl_connection_flush (connection.c:290)
==267==                                                                                              
==267== Use of uninitialised value of size 8                                                         
==267==    at 0x4852B44: strcmp (in /usr/local/libexec/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==267==    by 0x5A92C98: ????
==267==    by 0x5EBC569: ffi_call_unix64 (unix64.S:105)
==267==    by 0x5EBB801: ffi_call_int (ffi64.c:672)         
==267==    by 0x5EBB395: ffi_call (ffi64.c:691)                                                      
==267==    by 0x5C9D8B6: wl_closure_invoke (connection.c:1025)
==267==    by 0x5C9BAE8: dispatch_event (wayland-client.c:1595)
==267==    by 0x5C9B496: dispatch_queue (wayland-client.c:1741)
==267==    by 0x5C9B496: wl_display_dispatch_queue_pending (wayland-client.c:1983)
==267==    by 0x5C9AD4D: wl_display_dispatch_queue (wayland-client.c:1959)
==267==    by 0x5C9A9EA: wl_display_roundtrip_queue (wayland-client.c:1370)
==267==    by 0x5A92DDC: ????
==267==    by 0x40B218: ????
==267==  Uninitialised value was created by a stack allocation                  
==267==    at 0x5EBB43E: ffi_call_int (ffi64.c:585)

I'll have to take a closer look at the nvidia vdpau module to see why that has a floating point.

I'll look into these things, especially the slowdown issue, but they most likely aren't caused by nvidia-drm itself. So far I'm guessing they are just new paths that are now hit on wayland that nobody has tested yet.

iron-udjin commented 1 year ago

Mouse lag duration depends of animation duration. Sometimes it's 1-2 sec. Sometimes it's longer. When you click somewhere, your muscule memory know how far away button is located which you need to click. When it lags and you started moving at the same time, of course you don't reach a certain distance. It's terribly uncomfortable. I hope you understand what I meant.

I can assume that it's possibly not a nvidia-drm issue. But a waird thing that it's happening with telegram-desktop also. Possibly the problem somewhere between wayland and DRM subsystem.

P.S: Please add notes regarding correct way to delete this driver from the system to avoid any conflicts with the driver from ports. It should save people who want to test it but afraid to get problems with remaining/conflicting libraries or binaries after install driver from ports after that.

Thank you.

amshafer commented 1 year ago

Also, would you mind adding the specs of the machine you see this on? Primarily CPU/GPU/memory/is your disk an ssd

amshafer commented 1 year ago

Hm that's not really what I'm seeing. If you've built things (sway, libwayland, etc) from source you might want to double check that you did an optimized build. See meson's -Dbuildtype for example. Any lag I see is very minute and could just be considered part of SW rendering the cursor. Also building the kernel modules without any debug flags could make a difference too, although it runs fast enough for me with them.

Although there isn't a proper way to uninstall, you should be fine just doing a pkg install nvidia-driver and it will overwrite all driver files. Only other thing would be removing /boot/modules/nvidia-drm.ko

iron-udjin commented 1 year ago

Also, would you mind adding the specs of the machine you see this on? Primarily CPU/GPU/memory/is your disk an ssd

https://bsd-hardware.info/?probe=b528c0bbe3 But I don't know why does it detect my DE as Xfce.

If you've built things (sway, libwayland, etc) from source you might want to double check that you did an optimized build.

There is no DEBUG option available for wayland nor sway. In my /etc/make.conf only: CFLAGS+= -O2 -pipe -march=native So, I've built definitelly optimized versions of the ports. Kernel and modules built without debug. You can check my config in the hardware report.

amshafer commented 1 year ago

Sorry but I'm still not able to reproduce the lag issue you're having, even when using all the same versions on an install running stable and using your env settings. Can you please try a couple things if you're able to?

Basically you're going to launch things like normal, open chrome/whatever, launch the dtrace command above, immediately start triggering the lag in chrome like you normally would (do this for at least 10 seconds), then give dtrace a Ctrl-C to kill it. Do the reset of the commands to generate the flamegraph svg image, which you can link here. Hopefully that will show an obvious hole where sway gets blocked while rendering.

Sorry for the list of requests, I've tried a wide variety of changes with this setup but still haven't reproduced it.

iron-udjin commented 1 year ago

Sorry for late reply (war is going in my country), didn't have enough time to play with it.

I'm guessing you'll see libinput continue printing even while sway freezes, but will be good to verify. No, it isn't. When I click in chrome to create new tab and don't move mouse, it prints only:

event8   POINTER_BUTTON          +4.619s    BTN_LEFT (272) pressed, seat count: 1
event8   POINTER_BUTTON          +4.758s    BTN_LEFT (272) released, seat count: 0

Actually, it's not freezes, it slowdowns mouse movement.

Here is dtrace if sway. After start I opened a few tabs in chrome and then closed them one by one. It's slowdowns mouse only on first few tabs. After 3-th of 4-th tab lag not so noticeable. out

You may want to try the 525.78.01 branch, it has a couple fixes that shouldn't impact you, but would be good to make sure.

Traces above is on 525.78.01 driver already.

Any fancy way of launching sway?

Yes, it is. I couldn't find launcher which allow me to start sway without login. Previously I used slim but it doesn't have wayland support. So, I start sway from /etc/rc.local: /usr/bin/su -l iron -c "bash -c '/usr/local/bin/dbus-run-session -- /usr/local/bin/sway &> /tmp/sway.log'"

Does dmesg have any obvious warning signs?

Nothing interesting:

nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  525.78.01  Mon Dec 26 05:26:31 UTC 2022 
nvidia0: <NVIDIA GeForce RTX 2080 Ti> on vgapci0 
vgapci0: child nvidia0 requested pci_enable_io
[drm] [nvidia-drm] [GPU ID 0x00000300] Loading driver                                                                                                                                                                                                          
[drm] Initialized nvidia-drm 0.0.0 20160202 for nvidia0 on minor 0
nvidia-modeset: WARNING: GPU:0: Acer XB271HU (DP-4): Failed to initialize G-SYNC

With new driver and updated drm-kmod mouse slowdowns not so noticeable but still exists.

amshafer commented 1 year ago

Sorry to hear you're affected by that. Thanks for providing all those details

Actually, it's not freezes, it slowdowns mouse movement.

By this you mean that the mouse still moves but slowly and the drawing the cursor stutters? More on this later, but does drawing anything else in firefox/sway slow down when the mouse does?

When I click in chrome to create new tab and don't move mouse, it prints only

What about when you move the mouse side to side while pressing the buttons. Do you see motion events in between the button press events or do the motion events "stop"/slow at the same time you see the issue in sway?

The other interesting test would be can you run something like youtube in the background without the issue while triggering the mouse slowdowns? i.e. run two firefox windows, with youtube (or anything that draws constantly) in one and recreating the slowdown in the other. If you see only the mouse slowdown but everything else renders smoothly without the same slowdown that would be good to know. Or maybe you'll see youtube slow/stutter/freeze at the same time the mouse does.

With new driver and updated drm-kmod mouse slowdowns not so noticeable but still exists.

How much less noticeable? Do you happen to remember the version(s) you updated to that made it better?

iron-udjin commented 1 year ago

By this you mean that the mouse still moves but slowly and the drawing the cursor stutters? More on this later, but does drawing anything else in firefox/sway slow down when the mouse does?

It feels like mouse sensivity become lower while animation ongoing. Regarding animation itself - it looks like drawing FPS drop dramaticly down when some complex animation ongoing. It happens for a short period of time, for example while opening new tab in chromium or scroll the page. Also when I send message in telegram-desktop, after I hit Enter, the block with previous messaged above scrolls up with notible stuttering, no smooth scrolling animation.

What's weird, it doesn't happen when I switch existing tabs chromium. Also, when I switch a tab and start scroll, first animation performs with low FPS but when scroll back - everything is fine, animation is smooth.

What about when you move the mouse side to side while pressing the buttons. Do you see motion events in between the button press events or do the motion events "stop"/slow at the same time you see the issue in sway?

Don't see any slowdowns. Pointer movement slowdowns only when animation ongoing.

How much less noticeable? Do you happen to remember the version(s) you updated to that made it better?

A little bit noticeable. It was 525.60.11. Currently I'm on 525.78.01.

I've found possibly interesting sway log messages:

04:07:06.632 [ERROR] [wlr] [libinput] event7  - Logitech PRO X Wireless, class 0/0, rev 2.00/25.01, addr 9: client bug: event processing lagging behind by 21ms, your system is too slow
04:09:57.524 [ERROR] [wlr] [libinput] event7  - Logitech PRO X Wireless, class 0/0, rev 2.00/25.01, addr 9: client bug: event processing lagging behind by 22ms, your system is too slow
04:10:27.190 [ERROR] [wlr] [libinput] event7  - Logitech PRO X Wireless, class 0/0, rev 2.00/25.01, addr 9: client bug: event processing lagging behind by 22ms, your system is too slow

...but my system wasn't overloaded at that time.

iron-udjin commented 1 year ago

Compare this two animations. One of them was in sway with i915kms driver, another one with nvidia-drm. Look at animation of new message appearence in the window.

i915kms driver:

https://user-images.githubusercontent.com/1779285/219192279-d1d8acb9-6fd7-4c4c-9f29-9987fc77c06a.mp4

nvidia-drm driver:

https://user-images.githubusercontent.com/1779285/219192254-b7e19cf2-b326-47c5-89e1-208c5c45ca7c.mp4

Possibly diffrence of those animations not so notible on the video. But it's notible with other animations in chromium together with mouse slowdowns.