ValveSoftware / gamescope

SteamOS session compositing window manager
Other
3.02k stars 198 forks source link

Strange flickering using gamescope for Steam games with NVIDIA #495

Open kakra opened 2 years ago

kakra commented 2 years ago

So I tested gamescope (compiled from latest master 3.11.31-beta4-6-g97288b8) yesterday on NVIDIA 515.43.04:

gamescope -f -U -w 1920 -h 1080 -- %command%

It's upscaling to 3840x2160 at 60 fps (according to the Steam FPS counter).

But across all games I tested, it feels like 60 fps but it looks like it's swapping render buffers in the wrong order, no matter if vsync is on or off. Hard to describe but I try: If turning the camera slowly, it looks smooth. But turning or moving around faster, it looks like 30 Hz flickering with graphics constantly jumping back and forth, like the game is rendering frames 1,2,3,4,... but gamescope is showing 2,1,4,3... It looks like games now use triple buffering instead of double buffering but showing the wrong frame buffer on screen, supported by the fact that I'm still seeing screen tearing (while it may be triple buffering).

As mentioned above: I can see screen tearing but this is probably some general issue with the NVIDIA driver and multi-monitor. Full composition pipeline isn't enabled (doing so doesn't fix tearing for me anyways, it looks like vsync is offset by around half a screen refresh, maybe due to using multiple monitors).

Also, sometimes after closing gamescope, my desktop turns upside down with sometimes flickering back to the right orientation. This can be solved by running another game without gamescope. It isn't solved by restarting kwin.

kakra commented 2 years ago

So I found the strange flickering may be caused by kwin because gamescope doesn't block compositing in fullscreen mode. Without kwin compositing, the flickering is gone. The intense tearing is still an issue, tho.

pchome commented 2 years ago

graphics constantly jumping back and forth

I can see this in game menus, sometimes when selecting next menu item it looks like it then jumps to previous menu item and back.

More fun `VK_INSTANCE_LAYERS=VK_LAYER_MESA_overlay gamescope -i -- stranglevk 40 vkcube` This command displays two mesa layers, one for gamescope and one for vkcube. Both shows 40fps, but on vkcube it's ~25ms and on gamescope it's constantly jumping between ~16ms and ~33ms. When I randomly shaking mouse then mesa overlay on gamescope shows 60 fps and 16ms. Much better: `VK_INSTANCE_LAYERS=VK_LAYER_MESA_overlay gamescope -i -r30 -- stranglevk 30 vkcube`

Also, there are ton of events generating (or something), glxgears process stats:

perf report `glxgears` ``` Performance counter stats for process id '30897': 539.04 msec task-clock # 0.024 CPUs utilized 1536 context-switches # 2.849 K/sec 202 cpu-migrations # 374.737 /sec 0 page-faults # 0.000 /sec 464782409 cycles # 0.862 GHz (67.05%) 265246086 stalled-cycles-frontend # 57.07% frontend cycles idle (66.51%) 96366893 stalled-cycles-backend # 20.73% backend cycles idle (65.96%) 162872237 instructions # 0.35 insn per cycle # 1.63 stalled cycles per insn (66.66%) 32568835 branches # 60.420 M/sec (66.99%) 4890428 branch-misses # 15.02% of all branches (66.84%) 22.543913280 seconds time elapsed ``` `gamescope -- glxgears` ``` Performance counter stats for process id '30991': 36861.88 msec task-clock # 0.988 CPUs utilized 14069 context-switches # 381.668 /sec 34 cpu-migrations # 0.922 /sec 0 page-faults # 0.000 /sec 113806258921 cycles # 3.087 GHz (66.62%) 2333765348 stalled-cycles-frontend # 2.05% frontend cycles idle (66.68%) 109240541753 stalled-cycles-backend # 95.99% backend cycles idle (66.74%) 3658279819 instructions # 0.03 insn per cycle # 29.86 stalled cycles per insn (66.68%) 795305924 branches # 21.575 M/sec (66.64%) 24605241 branch-misses # 3.09% of all branches (66.64%) 37.305241359 seconds time elapsed ``` ``` # Overhead Command Shared Object Symbol # ........ ........ ............................. ............................................ # 93.35% glxgears [kernel.vmlinux] [k] copy_user_generic_string 1.24% glxgears libnvidia-glcore.so.515.43.04 [.] _nv041glcore 0.95% glxgears libnvidia-glcore.so.515.43.04 [.] _nv011glcore 0.56% glxgears [vdso] [.] __vdso_clock_gettime 0.41% glxgears libnvidia-glcore.so.515.43.04 [.] _nv023glcore 0.23% glxgears [kernel.vmlinux] [k] sched_clock_cpu ... ``` ``` ... 93.12% 92.71% glxgears [kernel.vmlinux] [k] copy_user_generic_string | --92.69%--0x5f53f3c000007fff writev entry_SYSCALL_64_after_hwframe do_syscall_64 __x64_sys_writev vfs_writev do_iter_write.part.0 do_iter_readv_writev sock_write_iter unix_stream_sendmsg skb_copy_datagram_from_iter | |--81.40%--copy_page_from_iter | copy_user_generic_string | --11.29%--_copy_from_iter copy_user_generic_string ... ```

This is on X11, I'm not sure if I correctly set everything up.

EDIT: same or related: https://gitlab.freedesktop.org/xorg/xserver/-/issues/1317 (implicit synchronization issues on Nvidia, see comments) https://github.com/NVIDIA/open-gpu-kernel-modules/issues/187

sad-goldfish commented 2 years ago

Another perf report. Removed entries below ~1%.

perf report ``` 75.45% 0.05% glxgears [kernel.vmlinux] [k] entry_SYSCALL_64_after_hwframe | --75.40%--entry_SYSCALL_64_after_hwframe do_syscall_64 | |--73.84%--do_writev | vfs_writev | | | --73.84%--do_iter_write | | | --73.84%--do_iter_readv_writev | sock_write_iter | sock_sendmsg | unix_stream_sendmsg | | | --73.73%--skb_copy_datagram_from_iter | | | |--72.24%--copy_page_from_iter | | | | | --72.24%--copy_user_enhanced_fast_string | | | | | |--2.84%--asm_sysvec_apic_timer_interrupt | | | | | --2.62%--irq_entries_start | | | --1.48%--_copy_from_iter | | | --1.48%--copy_user_enhanced_fast_string | |--0.74%--__ia32_sys_sched_yield | | | --0.71%--schedule | | | --0.70%--__schedule | --0.69%--syscall_exit_to_user_mode 75.40% 0.02% glxgears [kernel.vmlinux] [k] do_syscall_64 | --75.38%--do_syscall_64 | |--73.84%--do_writev | vfs_writev | | | --73.84%--do_iter_write | | | --73.84%--do_iter_readv_writev | sock_write_iter | sock_sendmsg | unix_stream_sendmsg | | | --73.73%--skb_copy_datagram_from_iter | | | |--72.24%--copy_page_from_iter | | | | | --72.24%--copy_user_enhanced_fast_string | | | | | |--2.84%--asm_sysvec_apic_timer_interrupt | | | | | --2.62%--irq_entries_start | | | --1.48%--_copy_from_iter | | | --1.48%--copy_user_enhanced_fast_string | |--0.74%--__ia32_sys_sched_yield | | | --0.71%--schedule | | | --0.70%--__schedule | --0.69%--syscall_exit_to_user_mode 73.85% 0.00% glxgears libc.so.6 [.] writev | --73.85%--writev | --73.84%--entry_SYSCALL_64_after_hwframe do_syscall_64 do_writev vfs_writev | --73.84%--do_iter_write | --73.84%--do_iter_readv_writev sock_write_iter sock_sendmsg unix_stream_sendmsg | --73.73%--skb_copy_datagram_from_iter | |--72.24%--copy_page_from_iter | | | --72.24%--copy_user_enhanced_fast_string | | | |--2.84%--asm_sysvec_apic_timer_interrupt | | | --2.62%--irq_entries_start | --1.48%--_copy_from_iter | --1.48%--copy_user_enhanced_fast_string 73.84% 0.00% glxgears [kernel.vmlinux] [k] do_writev | ---do_writev vfs_writev | --73.84%--do_iter_write | --73.84%--do_iter_readv_writev sock_write_iter sock_sendmsg unix_stream_sendmsg | --73.73%--skb_copy_datagram_from_iter | |--72.24%--copy_page_from_iter | | | --72.24%--copy_user_enhanced_fast_string | | | |--2.84%--asm_sysvec_apic_timer_interrupt | | | --2.62%--irq_entries_start | --1.48%--_copy_from_iter | --1.48%--copy_user_enhanced_fast_string 73.84% 0.00% glxgears [kernel.vmlinux] [k] vfs_writev | ---vfs_writev | --73.84%--do_iter_write | --73.84%--do_iter_readv_writev sock_write_iter sock_sendmsg unix_stream_sendmsg | --73.73%--skb_copy_datagram_from_iter | |--72.24%--copy_page_from_iter | | | --72.24%--copy_user_enhanced_fast_string | | | |--2.84%--asm_sysvec_apic_timer_interrupt | | | --2.62%--irq_entries_start | --1.48%--_copy_from_iter | --1.48%--copy_user_enhanced_fast_string 73.84% 0.00% glxgears [kernel.vmlinux] [k] do_iter_write | --73.84%--do_iter_write do_iter_readv_writev sock_write_iter sock_sendmsg unix_stream_sendmsg | --73.73%--skb_copy_datagram_from_iter | |--72.24%--copy_page_from_iter | | | --72.24%--copy_user_enhanced_fast_string | | | |--2.84%--asm_sysvec_apic_timer_interrupt | | | --2.62%--irq_entries_start | --1.48%--_copy_from_iter | --1.48%--copy_user_enhanced_fast_string 73.84% 0.00% glxgears [kernel.vmlinux] [k] do_iter_readv_writev | ---do_iter_readv_writev sock_write_iter sock_sendmsg unix_stream_sendmsg | --73.73%--skb_copy_datagram_from_iter | |--72.24%--copy_page_from_iter | | | --72.24%--copy_user_enhanced_fast_string | | | |--2.84%--asm_sysvec_apic_timer_interrupt | | | --2.62%--irq_entries_start | --1.48%--_copy_from_iter | --1.48%--copy_user_enhanced_fast_string 73.84% 0.00% glxgears [kernel.vmlinux] [k] sock_write_iter | ---sock_write_iter sock_sendmsg unix_stream_sendmsg | --73.73%--skb_copy_datagram_from_iter | |--72.24%--copy_page_from_iter | | | --72.24%--copy_user_enhanced_fast_string | | | |--2.84%--asm_sysvec_apic_timer_interrupt | | | --2.62%--irq_entries_start | --1.48%--_copy_from_iter | --1.48%--copy_user_enhanced_fast_string 73.84% 0.00% glxgears [kernel.vmlinux] [k] sock_sendmsg | ---sock_sendmsg unix_stream_sendmsg | --73.73%--skb_copy_datagram_from_iter | |--72.24%--copy_page_from_iter | | | --72.24%--copy_user_enhanced_fast_string | | | |--2.84%--asm_sysvec_apic_timer_interrupt | | | --2.62%--irq_entries_start | --1.48%--_copy_from_iter | --1.48%--copy_user_enhanced_fast_string 73.84% 0.00% glxgears [kernel.vmlinux] [k] unix_stream_sendmsg | --73.83%--unix_stream_sendmsg | --73.73%--skb_copy_datagram_from_iter | |--72.24%--copy_page_from_iter | | | --72.24%--copy_user_enhanced_fast_string | | | |--2.84%--asm_sysvec_apic_timer_interrupt | | | --2.62%--irq_entries_start | --1.48%--_copy_from_iter | --1.48%--copy_user_enhanced_fast_string 73.73% 0.00% glxgears [kernel.vmlinux] [k] skb_copy_datagram_from_iter | --73.73%--skb_copy_datagram_from_iter | |--72.24%--copy_page_from_iter | | | --72.24%--copy_user_enhanced_fast_string | | | |--2.84%--asm_sysvec_apic_timer_interrupt | | | --2.62%--irq_entries_start | --1.48%--_copy_from_iter | --1.48%--copy_user_enhanced_fast_string 73.72% 73.43% glxgears [kernel.vmlinux] [k] copy_user_enhanced_fast_string | --73.22%--0 writev entry_SYSCALL_64_after_hwframe do_syscall_64 do_writev vfs_writev do_iter_write do_iter_readv_writev sock_write_iter sock_sendmsg unix_stream_sendmsg skb_copy_datagram_from_iter | |--71.74%--copy_page_from_iter | copy_user_enhanced_fast_string | | | |--2.79%--asm_sysvec_apic_timer_interrupt | | | --2.62%--irq_entries_start | --1.48%--_copy_from_iter copy_user_enhanced_fast_string 73.72% 0.00% glxgears [unknown] [k] 0000000000000000 | ---0 | --73.64%--writev | --73.64%--entry_SYSCALL_64_after_hwframe do_syscall_64 do_writev vfs_writev | --73.64%--do_iter_write | --73.63%--do_iter_readv_writev sock_write_iter sock_sendmsg unix_stream_sendmsg | --73.52%--skb_copy_datagram_from_iter | |--72.04%--copy_page_from_iter | | | --72.03%--copy_user_enhanced_fast_string | | | |--2.83%--asm_sysvec_apic_timer_interrupt | | | --2.62%--irq_entries_start | --1.48%--_copy_from_iter | --1.48%--copy_user_enhanced_fast_string 72.24% 0.00% glxgears [kernel.vmlinux] [k] copy_page_from_iter | --72.24%--copy_page_from_iter copy_user_enhanced_fast_string | |--2.84%--asm_sysvec_apic_timer_interrupt | --2.62%--irq_entries_start 6.35% 0.12% Xwayland [kernel.vmlinux] [k] entry_SYSCALL_64_after_hwframe | --6.23%--entry_SYSCALL_64_after_hwframe | --6.22%--do_syscall_64 | |--4.20%--__sys_recvmsg | | | --4.16%--___sys_recvmsg | | | --4.05%--____sys_recvmsg | | | --4.02%--unix_stream_recvmsg | | | --4.00%--unix_stream_read_generic | | | --3.38%--unix_stream_read_actor | | | --3.38%--skb_copy_datagram_iter | | | --3.38%--__skb_datagram_iter | | | --3.34%--_copy_to_iter | | | --3.28%--copy_user_enhanced_fast_string | --1.10%--syscall_exit_to_user_mode 6.24% 0.05% Xwayland [kernel.vmlinux] [k] do_syscall_64 | --6.19%--do_syscall_64 | |--4.20%--__sys_recvmsg | | | --4.16%--___sys_recvmsg | | | --4.05%--____sys_recvmsg | | | --4.02%--unix_stream_recvmsg | | | --4.00%--unix_stream_read_generic | | | --3.38%--unix_stream_read_actor | | | --3.38%--skb_copy_datagram_iter | | | --3.38%--__skb_datagram_iter | | | --3.34%--_copy_to_iter | | | --3.28%--copy_user_enhanced_fast_string | --1.10%--syscall_exit_to_user_mode 4.79% 0.02% Xwayland libc.so.6 [.] recvmsg | --4.79%--recvmsg | --4.53%--entry_SYSCALL_64_after_hwframe | --4.50%--do_syscall_64 | --4.20%--__sys_recvmsg | --4.16%--___sys_recvmsg | --4.05%--____sys_recvmsg | --4.02%--unix_stream_recvmsg | --4.00%--unix_stream_read_generic | --3.38%--unix_stream_read_actor | --3.38%--skb_copy_datagram_iter | --3.38%--__skb_datagram_iter | --3.34%--_copy_to_iter | --3.28%--copy_user_enhanced_fast_string 4.20% 0.01% Xwayland [kernel.vmlinux] [k] __sys_recvmsg | --4.19%--__sys_recvmsg | --4.16%--___sys_recvmsg | --4.05%--____sys_recvmsg | --4.02%--unix_stream_recvmsg | --4.00%--unix_stream_read_generic | --3.38%--unix_stream_read_actor | --3.38%--skb_copy_datagram_iter | --3.38%--__skb_datagram_iter | --3.34%--_copy_to_iter | --3.28%--copy_user_enhanced_fast_string 4.16% 0.02% Xwayland [kernel.vmlinux] [k] ___sys_recvmsg | --4.14%--___sys_recvmsg | --4.05%--____sys_recvmsg | --4.02%--unix_stream_recvmsg | --4.00%--unix_stream_read_generic | --3.38%--unix_stream_read_actor | --3.38%--skb_copy_datagram_iter | --3.38%--__skb_datagram_iter | --3.34%--_copy_to_iter | --3.28%--copy_user_enhanced_fast_string 4.05% 0.02% Xwayland [kernel.vmlinux] [k] ____sys_recvmsg | --4.03%--____sys_recvmsg | --4.02%--unix_stream_recvmsg | --4.00%--unix_stream_read_generic | --3.38%--unix_stream_read_actor | --3.38%--skb_copy_datagram_iter | --3.38%--__skb_datagram_iter | --3.34%--_copy_to_iter | --3.28%--copy_user_enhanced_fast_string 4.02% 0.01% Xwayland [kernel.vmlinux] [k] unix_stream_recvmsg | --4.00%--unix_stream_recvmsg | --4.00%--unix_stream_read_generic | --3.38%--unix_stream_read_actor | --3.38%--skb_copy_datagram_iter | --3.38%--__skb_datagram_iter | --3.34%--_copy_to_iter | --3.28%--copy_user_enhanced_fast_string 4.00% 0.06% Xwayland [kernel.vmlinux] [k] unix_stream_read_generic | --3.94%--unix_stream_read_generic | --3.38%--unix_stream_read_actor | --3.38%--skb_copy_datagram_iter | --3.38%--__skb_datagram_iter | --3.34%--_copy_to_iter | --3.28%--copy_user_enhanced_fast_string 3.38% 0.00% Xwayland [kernel.vmlinux] [k] unix_stream_read_actor | --3.38%--unix_stream_read_actor | --3.38%--skb_copy_datagram_iter | --3.38%--__skb_datagram_iter | --3.34%--_copy_to_iter | --3.28%--copy_user_enhanced_fast_string 3.38% 0.00% Xwayland [kernel.vmlinux] [k] skb_copy_datagram_iter | --3.38%--skb_copy_datagram_iter __skb_datagram_iter | --3.34%--_copy_to_iter | --3.28%--copy_user_enhanced_fast_string 3.38% 0.03% Xwayland [kernel.vmlinux] [k] __skb_datagram_iter | --3.34%--__skb_datagram_iter | --3.34%--_copy_to_iter | --3.28%--copy_user_enhanced_fast_string 3.34% 0.05% Xwayland [kernel.vmlinux] [k] _copy_to_iter | --3.30%--_copy_to_iter | --3.28%--copy_user_enhanced_fast_string 3.32% 3.29% Xwayland [kernel.vmlinux] [k] copy_user_enhanced_fast_string | --3.27%--recvmsg entry_SYSCALL_64_after_hwframe do_syscall_64 __sys_recvmsg ___sys_recvmsg | --3.27%--____sys_recvmsg unix_stream_recvmsg unix_stream_read_generic unix_stream_read_actor skb_copy_datagram_iter __skb_datagram_iter _copy_to_iter | --3.26%--copy_user_enhanced_fast_string 2.84% 0.00% glxgears [kernel.vmlinux] [k] asm_sysvec_apic_timer_interrupt | ---asm_sysvec_apic_timer_interrupt 2.62% 0.00% glxgears [kernel.vmlinux] [k] irq_entries_start | ---irq_entries_start 2.55% 0.00% glxgears [unknown] [k] 0x89495541f6894956 | ---0x89495541f6894956 0x7f3f536b0a40 | --2.26%--__sched_yield | --1.48%--entry_SYSCALL_64_after_hwframe | --1.44%--do_syscall_64 | |--0.74%--__ia32_sys_sched_yield | | | --0.71%--schedule | | | --0.70%--__schedule | --0.66%--syscall_exit_to_user_mode 2.55% 0.00% glxgears libnvidia-glcore.so.515.43.04 [.] 0x00007f3f536b0a40 | ---0x7f3f536b0a40 | --2.26%--__sched_yield | --1.48%--entry_SYSCALL_64_after_hwframe | --1.44%--do_syscall_64 | |--0.74%--__ia32_sys_sched_yield | | | --0.71%--schedule | | | --0.70%--__schedule | --0.66%--syscall_exit_to_user_mode 2.27% 0.02% glxgears libc.so.6 [.] __sched_yield | --2.24%--__sched_yield | --1.48%--entry_SYSCALL_64_after_hwframe | --1.44%--do_syscall_64 | |--0.74%--__ia32_sys_sched_yield | | | --0.71%--schedule | | | --0.70%--__schedule | --0.66%--syscall_exit_to_user_mode 2.16% 0.00% Xwayland [unknown] [.] 0000000000000000 | ---0 | --0.66%--0x7fe9e57500a3 1.61% 0.00% Xwayland [unknown] [k] 0x0000564419f1f020 | ---0x564419f1f020 | --1.54%--epoll_wait | --0.97%--__x64_sys_epoll_wait | --0.96%--do_epoll_wait | --0.62%--schedule_hrtimeout_range_clock | --0.52%--schedule | --0.50%--__schedule 1.56% 0.02% Xwayland libc.so.6 [.] epoll_wait | --1.54%--epoll_wait | --0.97%--__x64_sys_epoll_wait | --0.96%--do_epoll_wait | --0.62%--schedule_hrtimeout_range_clock | --0.52%--schedule | --0.50%--__schedule 1.48% 0.00% glxgears [kernel.vmlinux] [k] _copy_from_iter | ---_copy_from_iter | --1.48%--copy_user_enhanced_fast_string ```
sad-goldfish commented 2 years ago

Is this where the slowdown happens?

EDIT: Nope, not this one.

kakra commented 2 years ago

This problem seems gone with 515.48.07 but tearing is still terrible: While vsync seems generally broken with NVIDIA (at least for me, forcing composition pipeline does not help, I usually run without vsync now because otherwise it causes extreme stutter without eliminating tearing), I see at least just one line of tearing without gamescope but with gamescope, I see a whole block of tearing zick-zacking across a part of the screen - which could be partially explained by the issues linked by @pchome in https://github.com/Plagman/gamescope/issues/495#issuecomment-1126800982, like you took zick-zack scissors to cut between to frames.

pchome commented 2 years ago

@kakra FYI: KWIN_X11_FORCE_SOFTWARE_VSYNC=1 and KWIN_X11_NO_SYNC_TO_VBLANK=1 environment variables may help a bit with kwin compositor. There was a knob (or setting) in KDE for vsync, but it was removed a while ago.

kakra commented 2 years ago

@pchome It looks like putting Option "ForceCompositionPipeline" "on" into Section "Device" fixes most of the issues. Strangely, it didn't work properly before. Also, just enabling this in nvidia-settings doesn't seem to be enough.

somewhatfrog commented 1 year ago

I have same issue on Nvidia, regardless of driver version and with or without gamescope.

  1. Weird "mixed" frame pacing under wayland. (same as those author described).
  2. Stutters under wayland and xorg with ForceCompositionPipeline, stutters seem to be fixed to camera position or place in the game, but going into the inventory/map (The Ascent) or opening the map (Elden RIng) fixes the stutters for this particular place. (note: frametime graph show no spikes at all)
  3. In windowed/tiled mode games with Vsync have tearing, the line always appears at the same spot at about 2/7th of the screen.
  4. Vsync works perfect in gamescope only if started with -f. Without gamescope only in game's native Full Screen mode.

arch 6.0, nvidia 520, 60hz screen 5600X 3060Ti

Something tells me this is Nvidia's problem. Will post there I guess.