Open ipaqmaster opened 7 months ago
I should work on my Titles.
It seems steamwebhelper
is maxing out 4 CPU threads to 100%. Every few seconds another of its threads joins in maxing out a total of 5 CPU threads. This core has 24 threads available so I'm hoping this isn't a major contributor, but I want to eliminate it to lessen the overall system load in hopes to fix this problem.
I notice it's running with --disable-gpu-compositing --disable-gpu
. I've gone to Steam's settings and have toggled on Enable GPU accelerated rendering in web views (requires restart)
to help lessen this 4-5 cpu 100% load from that process hopefully off to the GPU. Doing this caused my /data/steam
dataset to become undefined in Steam making it look like my games were uninstalled. Trying to add the library still exhibits an empty list where dir's are supposed to be, so it was necessary to launch with steam -console
and running library_folder_add /data/steam
in that to do the trick.
I also (shamefully) noticed via atop 1
that syncoid
was invoking in the background every so often causing a bunch of near-realtime ZFS snapshots to be sent at 1GBPS to the nearby NAS. I was able to reproduce the frame time increases by running iperf3 -s
on the router and iperf3 -c routerHostname -P20
to push data to it maxing out my 1GBPS Ethernet link. Looking at lspci -v
revealed the Ethernet controller on this motherboard has a lower IRQ than the NVIDIA graphics card which was probably part of the cause when the interface pushed out the occasional snapshot in a few seconds.
Temporarily suspending syncoid
's replication of this desktop while cs2
is running without a doubt will be improving things here... but the frame time latency is still increasing periodically (Often perceived as a 1s+ freeze and then a rubberbanding from the server for doing so.) despite not much other CPU or PCIe load to blame at this point.
I'll continue digging. But enabling GPU acceleration in Steam's settings has cleaned up the top process list as seen by htop -d 0.1
a ton. I can only see cs2 as the top process now.
The latest update has bought many cool new features and some from CS;GO's era. But I found tonight was rubber-banding constantly every few seconds during all fights and the frame-time was spiking to 150ms+ constantly with stutters (Followed by more rubber-banding due to stuttering).
It was so bad. But the CPU, GPU and other system load was flawless.
It's frustrating to make seemingly random engine comparisons. But it's incredible how smoothly Overwatch 2 runs on this PC having picked it up again the other week. Not a hitch in sight and butter smooth all the way through a match without any tweaks or service-culling whatsoever. Is CS2 just not at its full potential yet on Linux?
Just the other night with Dust 2 added to Premier's map pool and some other cool new features - This was the worst I've ever seen to date. Practically unable to play with how much trouble the game was having.
Fully idle system capable of some great things in other titles which use up its cores and GPU and other compute. But since this update that was the worst and most frustrating match of my career. The stuttering, rubber-banding and GPU frame time drops / short-hangs didn't stop for more than 5 seconds any given moment. Peaking with an advantage was made dangerous.
I am experiencing similar frametime spikes. For me it started two days ago on Arch Linux, I think the game update from the 30th of April is the problem in my case. I also can't search for matches since that update, VAC always fails to verify my session.
Alright, I solved my issue, turns out it was a usb device that was getting reconnected every few seconds. Check your journal everyone!
Nothing like that for me. No logs in dmesg -wH
or sudo journalctl -f
during the moments the game's frame-time spikes nor its network latency.
I am now running kernel 6.8.9
and nvidia 550.78
. I continue to stop most background storage snapshotting+sending, metric and security services plus cpu isolation of non-critical services to give CS2 its best performing chance. The 3900x CPU (24 threads) is close to ~40% utilized at most during any moment in gameplay and typically less.
I have to restate that another FPS title "Overwatch 2" doesn't have this problem while pushing its own uncapped framerates and CPU plus GPU utilization. It's discouraging.
I went nuclear pacstrap
ing a new zfs rootfs dataset to boot into installing only the bare minimum packages I would need and setting a locale to get started (lightdm, lightdm-gtk-greeter, pipewire, pipewire-pulse, wireplumber, NetworkManager, openssh, steam, obs, zfs-dkms, nvidia-dkms). I added a user for myself and downloaded the game.
I didn't notice the hard hanging like the game has been doing in my usual boot environment which points to something I have my regular rootfs doing in the background as the cause for this.
I'll continue trying to isolate exactly what's causing this because at this point system stats are close to idle during gameplay. Though not as idle as that brand new cs2 testing rootfs which had only 50 or so processes spawned system-wide.
Gave up on this for now. Could not pin anything of the software or hardware. Perhaps just wishful thinking that it "seemed" better on a fresh installation.
Waiting for future patches. Everything else about the desktop experience and other titles (even FPSes) is flawless. Bummer.
Watch this video
https://www.youtube.com/watch?v=2FBnTa33jSQ
It changed my perceived notion of what the social media outlets say about snap versus what it truly is becoming.
I think I've got it. I went to replay The Talos Principle and noticed it was stuttering itself. Finally something else that stutters. So I dug into this again.
I had the computer flood a terminal with recent processes using while : ; do ps aux | tail -n5 ; done
like a hawk on the other monitor and then grepped out irrelevant processes from this 'top 5 newest processes' feed I had created leaving me with a blank terminal until it notices a new process. Whenever The Talos Principle stuttered I noticed lsof
was spawning with arguments like this:
/usr/sbin/lsof -P -F upnR -i TCP@127.0.0.1:59912
/usr/sbin/lsof -P -F upnR -i TCP@127.0.0.1:59916
/usr/sbin/lsof -P -F upnR -i TCP@127.0.0.1:37774
Every 0-5 seconds repeatedly. These spawning processes were spawning 1:1 with the stuttering in game. They come and go so quickly it was difficult to keep track of them. I managed to trim out the PID of these and pipe the output into ps -o ppid= $pid
to find their parent process.
It's the primary Steam pid.
I temporarily removed /usr/sbin/lsof
with: mv -nv /usr/sbin/lsof /usr/sbin/lsof.bak
but lsof
was still spawning now without its full path. I traced this new lsof binary to: ~/.local/share/Steam/ubuntu12_32/steam-runtime/amd64/usr/bin/lsof
. Steam has a backup plan.
So I renamed that too. The stuttering completely stopped in The Talos Principle.
I mv
'd them back and the lsof
's resumed spawning periodically. I opened CS2 and experienced the stuttering I've been dealing with and they lined up 1:1 with the near instant creation and vanishing of these lsof
pids.
I renamed them again to their .bak names to prevent Steam from spawning any of them and the deathmatch server I was in did not stutter anymore. (!) I'm not seeing performance pop-up metrics in the top right corner now either.
I don't know what Steam is using lsof
to look for here but it's clearly interrupting something important. I tried the same commands in an infinitely fast while loop in my shell but was unable to reproduce the stuttering they cause when Steam spawns them.
TL;DR, disallowing Steam to execute lsof
by renaming /usr/sbin/lsof and ~/.local/share/Steam/ubuntu12_32/steam-runtime/amd64/usr/bin/lsof
... I think has entirely resolved my stuttering problem.
Yep put both Steam's and the system's lsof
back and the top right performance metrics resumed popping up with unbearable stuttering while trying to shoot stuff and just run around.
This cannot be a permanent solution for me so I'd like to figure out why Steam's doing this every few seconds and what process it's expecting to find on these seemingly random ports.
-P
and -F
are just cosmetics and formatting and -i
specifies a search where its looking for TCP connections to localhost on some port. Running the same command with only TCP@127.0.0.1
at the end reveals some typical inter process TCP communication on my host.
But I don't know why Steam needs to look for something this way and why CS2 (And Talos, single player) stutters when Steam does this but not when I run the same command.
@ipaqmaster That should only be running whenever a steamwebhelper
tries to set up an IPC channel with the steam
app. Which should only happen at startup, but if it fails to connect it will try again every once in a while.
This is why you were also running into this other issue: https://github.com/ValveSoftware/steam-for-linux/issues/10504
Can you reply in that other issue with your Steam -> Help -> System Information
paste. That might help us track down what is different about your system configuration that is preventing this connection from being established. Additionally, if you can let us know where you installed steam from (e.g. flatpak vs package manager), and also any other information you think might be non-standard about your steam installation it might help us narrow down the problem and address it on our side.
I see. I've put the gist and below details in that other issue.
I got Steam the Archlinux official multilib
package repo with its package manager.
It appears as multilib/steam 1.0.0.79-2 [installed]
I have a short iptables ruleset deployed to my desktop/laptop devices which is mostly open. I can try wiping that in case a rule is blocking IPC between these two though connections form the lo
interface have an explicit -j ACCEPT
rule in here.
@ipaqmaster Can you update to the latest steam beta client and then collect this info once you have steam running:
pstree
~/.steam/steam/logs/transport_steamui.txt
Things came up apologies for not getting around to this sooner. I'll try to get that info shortly
I think is the closest report to what I'm seeing. I'll get a solid 120FPS (or more, though I've got vsync on). Then (vsync on or off, it doesn't matter) I'll see drops down to as much as a 40ms frame time max reported by the in game FPS meter (i.e., a brief drop to the equivalent frame time of a 24 FPS frame rate).
It's very ... noticeable, even with adaptive sync on, it can't make it feel smooth.
7950X and 7900 XTX
@DarkArc, check if you have hostname with dashes, pointing to 127.0.0.1, in /etc/hosts. That was issue for me too, but ~/.steam/steam/logs/transport_steamui.txt log kinda pointed me in the right direction and the right issue: https://github.com/ValveSoftware/steam-for-linux/issues/10879#issuecomment-2171168395
@DarkArc, check if you have hostname with dashes, pointing to 127.0.0.1, in /etc/hosts. That was issue for me too, but ~/.steam/steam/logs/transport_steamui.txt log kinda pointed me in the right direction and the right issue: ValveSoftware/steam-for-linux#10879 (comment)
I do not, I think for me the recent changes: https://github.com/ValveSoftware/csgo-osx-linux/issues/3803 fixed the issues I was having, but I haven't thoroughly tested since (thank you for the suggestion though!).
Your system information
Steam
->Help
->System Information
) in a gist:No hardware change since last issue: https://gist.github.com/ipaqmaster/c89209d95358fb321b4fab54003ab418
Yes
Please describe your issue in as much detail as possible:
Describe what you expected should happen and what did happen. Please link any large pastes as a Github Gist.
Steps for reproducing this issue:
The game will play well but with occasions where both the average frame time will shoot up to 150+ milliseconds in a hard hang/stutter but also the game's network latency will also occasionally spike up to multiple hundreds of milliseconds resulting in a hard rubber-banding experience for a brief moment.
I've put some assumptive troubleshooting into this with some overzealous scripting which:
systemctl set-property
on thesystem.slice
andinit.scope
restricting their AllowedCPUs to the host's 11th and 23rd CPU threads (the final core's two hyperthreads) to offload background tasks to the corner, out of CS2's multiple busy threads.user.slice
's AllowedCPUs to the remaining threads of the socket (0-10,12-22) which lets all other processes in userspace such as CS2 execute on these cores while background tasks are cornered on the last core, alone.performance
mode, keeping their clock stable around ~4GHz while the last core for background tasks remains onschedutil
cs2
PIDs including all child task pids and sets them to a higher scheduling priority (not quite realtime) combined with setting FIFO (First In First Out) scheduling on the game's threads.I've found this helps a lot.
Even though this 3900x AMD CPU isn't impacted by all of the known CPU vulnerabilities I found trying to play with
mitigations=off
in the kernel's boot arguments further reduces and sometimes fully eliminates this out of nowhere hangup issue the game experiences through gameplay.During the game's seemingly random network latency spikes (The real network shows no evidence of this event. It's strictly only CS2 experiencing this) and seemingly random frame rendering spikes there are no logs in
dmesg -wH
norsudo journalctl -f
to point blame at any process or scheduled task in-particular.The game just seems to hang rendering a frame every so often and the equivalent for its networking. The desktop experience is flawless and so are other games. I don't know why it has trouble like this.