ValveSoftware / Fossilize

A serialization format for various persistent Vulkan object types.
MIT License
554 stars 46 forks source link

Compiling Vulkan shaders with less than 50% of free memory leads to OOM situation and system freeze #227

Open Slater91 opened 1 year ago

Slater91 commented 1 year ago

Your system information

Please describe your issue in as much detail as possible:

My system does not have swap configured, which means once the RAM is all occupied the system can't simply use swap to get out of an OOM situation.

What I expect to happen: Steam detects how much memory is available and spawns fossilize_replay processes accordingly so as not to get into an OOM situation, instead of spawning a process for each logical CPU.

What actually happens: Steam spawns the maximum number of fossilize_replay processes it thinks the system supports (in my case, 16 processes). As each process takes up ~512 MB of RAM, this means the system needs at least 8 GB of free RAM. Whenever this is not the case, e.g. because you have other apps open, the system freezes and stops responding. I have tried leaving the system in this state for up to 15 minutes, but the only way to get out of this freeze is to perform a reset (ctrl+alt+printscreen+REISUB).

I have experienced this several times and it is consistently reproduceable.

Steps for reproducing this issue:

  1. On a system with 16 GB of RAM and a 16-threads CPU, as well as no swap configured, open applications until you have more than 8 GB occupied.
  2. Start Vulkan shaders compilation.
kakra commented 1 year ago

Well, as far as I understood, fossilize itself uses shared memory across all processes. So this is probably rather a behavior of the graphics driver. In htop, maybe look at column SHR and subtract that from the RES usage - or simply look at PSS.

Could you check how /proc/buddyinfo evolves during processing? Maybe take a snapshot each 15 seconds:

while true; do date; cat /proc/buddyinfo; sleep 15; done | tee buddyinfo.log

Also, I don't think running a Linux system without swap is a supported configuration unless you disable over-committing - which you probably don't want to. You should have at least some swap, maybe even as zram. 512 MB to 2 GB should be enough. Swap doesn't make your system slower, having too few RAM makes the system slower because you're going to experience cache thrashing under memory pressure, and this may result in early memory fragmentation (as the logger above could show) which makes it harder for the kernel to fully utilize your RAM effectively.

Does you system have PSI enabled? (see /proc/pressure/io) If it has, fossilize should pause threads if IO starts to thrash. It won't free memory but allows caches to be re-used for system interactivity. But it will need swap because paused fossilize won't free memory, the kernel must be able to swap it out (it's anonymous memory which cannot simply be paged back in from an executable or data file on disk).

Slater91 commented 1 year ago

I have just tested this. The amount of shared memory appears to be minimal, so the effective usage of each process appears to be around 500 MB of RAM. Please see the screenshot below: Screenshot_20230713_230304

Here is the buddyinfo log: buddyinfo.log (shader compilation had already ended by the last three entries)

As for PSI, it is not enabled.

Regarding the fact that swap doesn't make your system slower, it actually does and this is the only reason why I disabled it. The system becomes visibly, noticeably slower and laggier when it starts swapping. It remains snappy and fast when it is disabled (or once I disable it and it brings everything back to RAM). On top of this, I am currently experiencing a bug which makes the system swap out aggressively even when just 4 out of 16 GB are occupied, which makes it basically impossible to use the computer normally.

I don't know if this can help, but this is from journalctl when the system froze during compilation: journal.log It says there are 16 GB of swap because it actually was there, but I had set vm.swappiness=0 due to the aforementioned issue with constant swapping.

kakra commented 1 year ago

On top of this, I am currently experiencing a bug which makes the system swap out aggressively even when just 4 out of 16 GB are occupied, which makes it basically impossible to use the computer normally.

Yes, that's usually due to memory fragmentation. At this point it looks quite fragmented but it recovers later:

gio 13 lug 2023, 23:00:52, BST
Node 0, zone      DMA      1      1      1      1      1      1      1      1      0      2      2 
Node 0, zone    DMA32   2117   1684   1392   1244    351     88     26      7      5      5      2 
Node 0, zone   Normal  23083   4417   2677    984    262     23      1      2      2      1      0 

("Normal" is the interesting data point here, 23083 isolated free 4k pages, then 4417x 8k pages, 2677x 16k pages etc, that means if the kernel needs consecutive memory of 2 MB, it won't find it and has to swap out other pages or flush cache to create an area of 256 consecutive pages, take note: these are small isolated areas of memory because they are surrounded by memory that the kernel cannot move or swap out, user-space memory is movable)

Interestingly, SHR usage of fossilize is more like 180 MB for me (with RES around 256 MB).

Could you try running this after booting, and only then start fossilize?

echo within_size | sudo tee /sys/kernel/mm/transparent_hugepage/shmem_enabled
echo madvise | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
echo defer+madvise | sudo tee /sys/kernel/mm/transparent_hugepage/defrag
echo 64 | sudo tee /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none
echo 8 | sudo tee /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_swap
echo 32 | sudo tee /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_shared

It may also help your situation with only 4 GB occupied, and the system should not start swapping early (if you enabled swap). If it helps, your distribution should change hugepage defaults.

Also, try booting with cgroup_disable=memory kernel option, try different combinations of the above and this option to see how it improves behavior. I found that the cgroup memory controller currently doesn't behave very well under memory pressure and causes early swapping (with aggressive latency spikes).

Slater91 commented 11 months ago

Sorry for my terribly late reply. Thank you so much for your explanations. I have tested this on multiple kernel versions at this point, and both with and without the commands you provided. It looks like the issue with swap kicking in way too early was fixed in the meantime, so that specific one is not there any more (to be specific, I am using the Xanmod kernel, as you can see from the report I opened in their GitHub).

When it comes to the behaviour of fossilize_replay, however, I see no difference: if I have less than 8 GB of free memory, the system still freezes when swapping is disabled, as the memory usage of fossilize_replay processes stays the same even after running the commands you gave me. I have tried booting with cgroup_disable=memory and while it does seem to lower the overall memory consumption of applications, it doesn't seem to affect fossilize_replay.