ValveSoftware / csgo-osx-linux

Counter-Strike: Global Offensive
http://counter-strike.net
764 stars 69 forks source link

CS2 runs very well but gameplay experiences short hangs perceived as a slow frame (150ms+ spike top right) and also network latency spikes around the same amount. The system shows no problem and neither does its networking (fibre). #3717

Open ipaqmaster opened 1 month ago

ipaqmaster commented 1 month ago

Your system information

No hardware change since last issue: https://gist.github.com/ipaqmaster/c89209d95358fb321b4fab54003ab418

Yes

Please describe your issue in as much detail as possible:

Describe what you expected should happen and what did happen. Please link any large pastes as a Github Gist.

Steps for reproducing this issue:

  1. Play MM (Including the game's new pre-join shader compilation)

The game will play well but with occasions where both the average frame time will shoot up to 150+ milliseconds in a hard hang/stutter but also the game's network latency will also occasionally spike up to multiple hundreds of milliseconds resulting in a hard rubber-banding experience for a brief moment.


I've put some assumptive troubleshooting into this with some overzealous scripting which:

  1. Calls systemctl set-property on the system.slice and init.scope restricting their AllowedCPUs to the host's 11th and 23rd CPU threads (the final core's two hyperthreads) to offload background tasks to the corner, out of CS2's multiple busy threads.
  2. Sets the user.slice's AllowedCPUs to the remaining threads of the socket (0-10,12-22) which lets all other processes in userspace such as CS2 execute on these cores while background tasks are cornered on the last core, alone.
  3. Sets those leading cores designated to userspace (0-10,12-22) to performance mode, keeping their clock stable around ~4GHz while the last core for background tasks remains on schedutil
  4. Finds all cs2 PIDs including all child task pids and sets them to a higher scheduling priority (not quite realtime) combined with setting FIFO (First In First Out) scheduling on the game's threads.

I've found this helps a lot.

Even though this 3900x AMD CPU isn't impacted by all of the known CPU vulnerabilities I found trying to play with mitigations=off in the kernel's boot arguments further reduces and sometimes fully eliminates this out of nowhere hangup issue the game experiences through gameplay.

During the game's seemingly random network latency spikes (The real network shows no evidence of this event. It's strictly only CS2 experiencing this) and seemingly random frame rendering spikes there are no logs in dmesg -wH nor sudo journalctl -f to point blame at any process or scheduled task in-particular.

The game just seems to hang rendering a frame every so often and the equivalent for its networking. The desktop experience is flawless and so are other games. I don't know why it has trouble like this.

ipaqmaster commented 1 month ago

I should work on my Titles.

ipaqmaster commented 1 month ago

It seems steamwebhelper is maxing out 4 CPU threads to 100%. Every few seconds another of its threads joins in maxing out a total of 5 CPU threads. This core has 24 threads available so I'm hoping this isn't a major contributor, but I want to eliminate it to lessen the overall system load in hopes to fix this problem.

I notice it's running with --disable-gpu-compositing --disable-gpu. I've gone to Steam's settings and have toggled on Enable GPU accelerated rendering in web views (requires restart) to help lessen this 4-5 cpu 100% load from that process hopefully off to the GPU. Doing this caused my /data/steam dataset to become undefined in Steam making it look like my games were uninstalled. Trying to add the library still exhibits an empty list where dir's are supposed to be, so it was necessary to launch with steam -console and running library_folder_add /data/steam in that to do the trick.

I also (shamefully) noticed via atop 1 that syncoid was invoking in the background every so often causing a bunch of near-realtime ZFS snapshots to be sent at 1GBPS to the nearby NAS. I was able to reproduce the frame time increases by running iperf3 -s on the router and iperf3 -c routerHostname -P20 to push data to it maxing out my 1GBPS Ethernet link. Looking at lspci -v revealed the Ethernet controller on this motherboard has a lower IRQ than the NVIDIA graphics card which was probably part of the cause when the interface pushed out the occasional snapshot in a few seconds.

Temporarily suspending syncoid's replication of this desktop while cs2 is running without a doubt will be improving things here... but the frame time latency is still increasing periodically (Often perceived as a 1s+ freeze and then a rubberbanding from the server for doing so.) despite not much other CPU or PCIe load to blame at this point.

I'll continue digging. But enabling GPU acceleration in Steam's settings has cleaned up the top process list as seen by htop -d 0.1 a ton. I can only see cs2 as the top process now.

ipaqmaster commented 1 month ago

The latest update has bought many cool new features and some from CS;GO's era. But I found tonight was rubber-banding constantly every few seconds during all fights and the frame-time was spiking to 150ms+ constantly with stutters (Followed by more rubber-banding due to stuttering).

It was so bad. But the CPU, GPU and other system load was flawless.

It's frustrating to make seemingly random engine comparisons. But it's incredible how smoothly Overwatch 2 runs on this PC having picked it up again the other week. Not a hitch in sight and butter smooth all the way through a match without any tweaks or service-culling whatsoever. Is CS2 just not at its full potential yet on Linux?

ipaqmaster commented 1 month ago

Just the other night with Dust 2 added to Premier's map pool and some other cool new features - This was the worst I've ever seen to date. Practically unable to play with how much trouble the game was having.

Fully idle system capable of some great things in other titles which use up its cores and GPU and other compute. But since this update that was the worst and most frustrating match of my career. The stuttering, rubber-banding and GPU frame time drops / short-hangs didn't stop for more than 5 seconds any given moment. Peaking with an advantage was made dangerous.

bblacher commented 1 month ago

I am experiencing similar frametime spikes. For me it started two days ago on Arch Linux, I think the game update from the 30th of April is the problem in my case. I also can't search for matches since that update, VAC always fails to verify my session.

bblacher commented 1 month ago

Alright, I solved my issue, turns out it was a usb device that was getting reconnected every few seconds. Check your journal everyone!

ipaqmaster commented 3 weeks ago

Nothing like that for me. No logs in dmesg -wH or sudo journalctl -f during the moments the game's frame-time spikes nor its network latency.

I am now running kernel 6.8.9 and nvidia 550.78. I continue to stop most background storage snapshotting+sending, metric and security services plus cpu isolation of non-critical services to give CS2 its best performing chance. The 3900x CPU (24 threads) is close to ~40% utilized at most during any moment in gameplay and typically less.

I have to restate that another FPS title "Overwatch 2" doesn't have this problem while pushing its own uncapped framerates and CPU plus GPU utilization. It's discouraging.

ipaqmaster commented 3 weeks ago

I went nuclear pacstraping a new zfs rootfs dataset to boot into installing only the bare minimum packages I would need and setting a locale to get started (lightdm, lightdm-gtk-greeter, pipewire, pipewire-pulse, wireplumber, NetworkManager, openssh, steam, obs, zfs-dkms, nvidia-dkms). I added a user for myself and downloaded the game.

I didn't notice the hard hanging like the game has been doing in my usual boot environment which points to something I have my regular rootfs doing in the background as the cause for this.

I'll continue trying to isolate exactly what's causing this because at this point system stats are close to idle during gameplay. Though not as idle as that brand new cs2 testing rootfs which had only 50 or so processes spawned system-wide.

ipaqmaster commented 1 week ago

Gave up on this for now. Could not pin anything of the software or hardware. Perhaps just wishful thinking that it "seemed" better on a fresh installation.

Waiting for future patches. Everything else about the desktop experience and other titles (even FPSes) is flawless. Bummer.

luisalvarado commented 1 week ago

Watch this video

https://www.youtube.com/watch?v=2FBnTa33jSQ

It changed my perceived notion of what the social media outlets say about snap versus what it truly is becoming.

ipaqmaster commented 6 days ago

I think I've got it. I went to replay The Talos Principle and noticed it was stuttering itself. Finally something else that stutters. So I dug into this again.

I had the computer flood a terminal with recent processes using while : ; do ps aux | tail -n5 ; done like a hawk on the other monitor and then grepped out irrelevant processes from this 'top 5 newest processes' feed I had created leaving me with a blank terminal until it notices a new process. Whenever The Talos Principle stuttered I noticed lsof was spawning with arguments like this:

/usr/sbin/lsof -P -F upnR -i TCP@127.0.0.1:59912 /usr/sbin/lsof -P -F upnR -i TCP@127.0.0.1:59916 /usr/sbin/lsof -P -F upnR -i TCP@127.0.0.1:37774

Every 0-5 seconds repeatedly. These spawning processes were spawning 1:1 with the stuttering in game. They come and go so quickly it was difficult to keep track of them. I managed to trim out the PID of these and pipe the output into ps -o ppid= $pid to find their parent process.

It's the primary Steam pid.

I temporarily removed /usr/sbin/lsof with: mv -nv /usr/sbin/lsof /usr/sbin/lsof.bak but lsof was still spawning now without its full path. I traced this new lsof binary to: ~/.local/share/Steam/ubuntu12_32/steam-runtime/amd64/usr/bin/lsof. Steam has a backup plan.

So I renamed that too. The stuttering completely stopped in The Talos Principle.


I mv'd them back and the lsof's resumed spawning periodically. I opened CS2 and experienced the stuttering I've been dealing with and they lined up 1:1 with the near instant creation and vanishing of these lsof pids.

I renamed them again to their .bak names to prevent Steam from spawning any of them and the deathmatch server I was in did not stutter anymore. (!) I'm not seeing performance pop-up metrics in the top right corner now either.

I don't know what Steam is using lsof to look for here but it's clearly interrupting something important. I tried the same commands in an infinitely fast while loop in my shell but was unable to reproduce the stuttering they cause when Steam spawns them.

TL;DR, disallowing Steam to execute lsof by renaming /usr/sbin/lsof and ~/.local/share/Steam/ubuntu12_32/steam-runtime/amd64/usr/bin/lsof... I think has entirely resolved my stuttering problem.

ipaqmaster commented 6 days ago

Yep put both Steam's and the system's lsof back and the top right performance metrics resumed popping up with unbearable stuttering while trying to shoot stuff and just run around.

This cannot be a permanent solution for me so I'd like to figure out why Steam's doing this every few seconds and what process it's expecting to find on these seemingly random ports.

-P and -F are just cosmetics and formatting and -i specifies a search where its looking for TCP connections to localhost on some port. Running the same command with only TCP@127.0.0.1 at the end reveals some typical inter process TCP communication on my host.

But I don't know why Steam needs to look for something this way and why CS2 (And Talos, single player) stutters when Steam does this but not when I run the same command.

lostgoat commented 3 days ago

@ipaqmaster That should only be running whenever a steamwebhelper tries to set up an IPC channel with the steam app. Which should only happen at startup, but if it fails to connect it will try again every once in a while.

This is why you were also running into this other issue: https://github.com/ValveSoftware/steam-for-linux/issues/10504

Can you reply in that other issue with your Steam -> Help -> System Information paste. That might help us track down what is different about your system configuration that is preventing this connection from being established. Additionally, if you can let us know where you installed steam from (e.g. flatpak vs package manager), and also any other information you think might be non-standard about your steam installation it might help us narrow down the problem and address it on our side.

ipaqmaster commented 3 days ago

I see. I've put the gist and below details in that other issue.

I got Steam the Archlinux official multilib package repo with its package manager.

It appears as multilib/steam 1.0.0.79-2 [installed]

I have a short iptables ruleset deployed to my desktop/laptop devices which is mostly open. I can try wiping that in case a rule is blocking IPC between these two though connections form the lo interface have an explicit -j ACCEPT rule in here.

lostgoat commented 1 day ago

@ipaqmaster Can you update to the latest steam beta client and then collect this info once you have steam running: