Closed hparadiz closed 4 months ago
@hparadiz Please check with schedtool (pidof fossilize_replay)
which priority the processes are using. They should be scheduled as batch
already, using a very low group nice priority (not visible in top or schedtool but in /proc/PID/autogroup
).
Additionally, if supported by the system, fossilize will throttle itself using PSI if it detects spikes in IO latency to prevent locking up the desktop for multiple seconds. IO latency will spike if fossilize starts to dominate the page cache, RAM becomes low, or your filesystem cannot write data fast enough. In that case, it will put fossilize processes into stopped state. If your kernel isn't compiled with PSI, this won't work.
The last thing is memory usage: If this gets high, and you don't allow swapping, fossilize tends to dominate the page cache, this will result in high desktop latency and the system may take many minutes to recover from that memory shortage: It will appear frozen but with disk activity. In that case, you should enable swap and don't run with vm.swappiness=0
.
Also, in the background, it should not use more than 4-5 processes - but I've seen reports here where this doesn't seem to work, and if fossilize runs on all cores, it will overwhelm the memory subsystem. Without process autogrouping or PSI, it will also overwhelm CPU and disk - where the latter is the biggest problem. Maybe your distribution removed the core limit for background processing from Steam to "make it magically faster"?
Fossilize uses shared memory between all processes (so memory usage is lower than it may appear), but it is not backed by a temporary disk file, it's just anonymous memory, so the kernel needs swap memory to compensate for high fossilize activity.
To summarize:
vm.swappiness
down to zero, a value of 20-60 would be recommended to avoid cache thrashingI'm on Gentoo.
My machine has 32GB of ram. CPU is an AMD Ryzen 5950X
Swap is
/dev/nvme0n1p3 2887680 137105407 134217728 64G Linux swap
vm.swappiness = 60
Seeing 8 threads going
Looks like it is using
$ schedtool $(pidof fossilize_replay)
PID 30811: PRIO 0, POLICY B: SCHED_BATCH , NICE 19, AFFINITY 0xffffffff
PID 9455: PRIO 0, POLICY B: SCHED_BATCH , NICE 19, AFFINITY 0xffffffff
PID 9315: PRIO 0, POLICY B: SCHED_BATCH , NICE 19, AFFINITY 0xffffffff
PID 9216: PRIO 0, POLICY B: SCHED_BATCH , NICE 19, AFFINITY 0xffffffff
PID 9189: PRIO 0, POLICY B: SCHED_BATCH , NICE 19, AFFINITY 0xffffffff
PID 9147: PRIO 0, POLICY B: SCHED_BATCH , NICE 19, AFFINITY 0xffffffff
PID 9113: PRIO 0, POLICY B: SCHED_BATCH , NICE 19, AFFINITY 0xffffffff
PID 8813: PRIO 0, POLICY B: SCHED_BATCH , NICE 19, AFFINITY 0xffffffff
PID 8806: PRIO 0, POLICY B: SCHED_BATCH , NICE 19, AFFINITY 0xffffffff
$ cat /proc/6712/autogroup
/autogroup-27307 nice 19
I'll look into my kernel for PSI. I'm running self compiled 6.9.6
I was compiling chromium at the time.
Thank for you the lovely detailed response.
I'm on Gentoo, too, and it works great here.
So everything looks fine on your system (autogroups are there, batch scheduling works, swappiness is fine, swap is there), except you need to look if /proc/pressure/io
exists (PSI for IO) which fossilize is watching during the process. You can watch cat /proc/pressure/io
yourself while it is running. "some" means that some processes are blocking on IO, "all" counts all processes waiting on IO. "total" is the total number of nanoseconds, the remaining numbers are the averages over a sliding window. fossilize watches the "some" line.
Since you are on Gentoo and I feel like you enjoy some technical details for that reason: Actually, I had the idea to add the PSI feature into fossilize (because without, my system was really struggling with fossilize, I did the initial poc patch, but the main author greatly improved it and knew a lot better where to put the control knobs), and I also initiated to add autogroup nice because nice'ing individual processes with an autogroup kernel has absolutely zero effect (because nice only works within a process group, the group itself needs to become nice). The batch scheduler gives processes slightly longer time slices for better CPU cache hit rates (I added that patch to fossilize), at the cost of being more often preempted by other processes (which gives them a slight priority disadvantage over SCHED_OTHER
aka interactive processes, which is what we actually want).
All these changes finally made fossilize to be an absolute non-issue for me: no matter what it does, there's no impact on the system (except the HDD making some more noises).
If you're using btrfs or bcache, you may want to look at my kernel patches for Gentoo. BTW: There's a Steam-centric kernel patch, too, which probably won't apply to 6.9: I'm only maintaining those patches for LTS kernels. So the next round of patches will come end of December or early January.
I was compiling chromium at the time.
The issue here is more likely Chromium. The linker phase of Chromium is begging for RAM. Try removing -pipe
from your CFLAGS
and avoid compiling Chromium in tmpfs if you've set this up. You can set a different portage location per package using package.env
.
It was PSI. I compiled it into my kernel and now things seem more performant.
I'm all ears for any other kernel things I should check for.
It was PSI. I compiled it into my kernel and now things seem more performant.
Great to see that this fixed it - that is the intention of the PSI support in fossilize.
I'm all ears for any other kernel things I should check for.
Then maybe look at my kernel patches I'm using for Gentoo (but I'm using only LTS kernels): https://github.com/kakra/linux/pulls
PSI is not always a case or a "big deal". Normally you won't get any advantage with such CPU and RAM amount.
PSI is mostly for underpowered configurations (Steam Deck) or any laptop.
I am a gentoo user too. But fossilize driving me crazy. You have several choices but none of them are optimal at all.
So, even with all cores, it takes on my PC about 30 minutes, Just for Assetto Corsa, Counter Strike, Counter Strike Source, Counter Strike 2, Dead Island, Flatout Collection, Wreckfest, Metro Collection.
I noticed that, it could take much more than usual time, if there are several vulkan api implementations like amdvlk and amdgpu-pro-vulkan.
Please share your thoughts about this.
My configuration -> https://gist.github.com/sandikata/b5594bd79b35fc5dd556c3ff26189948
If you're using btrfs (according to your system info), the problem may actually be not having PSI in the kernel. The write patterns caused by fossilize are very aggressive to btrfs (random reads and writes via memmap) and its kernel memory allocator, leading to IO stutters. This has been a problem with KDE balloos index database but also with fossilize.
So your choices are probably:
This is most likely not a CPU-usage issue. fossilize will use extremely low-priority CPU, and it also creates its own process group for the auto-grouping scheduler so fossilize will only acts as a single process fair-share CPU user to the rest of the system (priority-wise, it will still use multiple cores and processes).
PSI is actually NOT a system for underpowered configurations. It's a system for processes to detect if they themselves are going to cause bottleneck situations because the system is otherwise busy, or let an admin plan for better resource allocation sharing - and such processes and services can then take proper countermeasures, e.g. pausing IO (which is what fossilize does), or flush caches (to reduce memory pressure of the dirty cache), or reduces threads. It's similar to watching the loadavg but instead PSI can look at each bottleneck individually or even per process.
I am not sure if i know a way to move specific data from steam to different location.
Feature Request
I confirm:
Description
Please make fossilize_replay check load average against amount of cores and avoid launching threads in normal priority
Justification
When another high cpu usage task is happening the machine can lock up due to many fossilize_replay threads running in the background
Risks
Low
References
https://github.com/ValveSoftware/Proton/issues/7000