ValveSoftware / Fossilize

A serialization format for various persistent Vulkan object types.
MIT License
578 stars 46 forks source link

[Shadow of the Tomb Raider] fossilize eats all RAM until it OOMs #194

Open philipl opened 2 years ago

philipl commented 2 years ago

The symptoms here seem to be the same as #84 but that one was fixed and closed, so I was asked to file a new issue.

In the last month or two, I've noticed that fossilize seems happy to consume all system memory without bounds, and this is particularly evident with Shadow of the Tomb Raider (native Linux version), which consistently sees fossilize gobble up all the memory and then eventually get OOM killed and you have to kick it by turning background processing off and on again. It does eventually complete because it doesn't start from zero each time, but it makes the system unusable for anything else until it finally completes. I've been preemptively turning background processing off and on while watching the memory usage to get it to complete a bit faster, but that's no fun.

I suspect that the behaviour here is not unique to SotTW - it's just that this seems to have enough work to do that it can saturate my system.

System Details:

Thanks.

kakra commented 2 years ago

I wonder if adding 1-2 GB of swap would work around the issue? The Linux kernel memory manager does not really like to work completely without swap unless you carefully control memory resources of your processes (multi-generational LRU memory management should solve this, afaik, Google developers currently do upstream efforts for such a patch which is also or will be used for Android). Also, if you have /proc/pressure/{io,memory}, fossilize should be able to control its memory usage before it runs the system into OOM situations. This can be enabled through the kernel PSI feature.

philipl commented 2 years ago

The kernel I've been using has /proc/pressure/{io,memory} so by itself that was certainly not enough to prevent this behaviour.

philipl commented 2 years ago

I've tested with 2GB of swap and 32GB of swap, and even then it's happy to gobble up all memory and then OOM. The swapfile makes no difference to how fossilize behaves.

kakra commented 2 years ago

What's the resident and virtual size of the fossilize processes when the problem builds up? Is it really fossilize itself, or does it rather dominate the cache and dirty pages?

philipl commented 2 years ago

It's basically just fossilize itself. Here is the state of affairs just before I run out of 64GB on my system:

image

philipl commented 2 years ago

FWIW, when all is said and done, my on disk cache size for SotTR is 1.3GB.

kakra commented 2 years ago

Did you manually increase the fossilize worker number? "to speed things up"? It should run two workers by default, and put workers into T state if something is running out of control. All this does not seem to work for you. Or maybe you're running a flatpak version of Steam which may not be able to access PSI?

philipl commented 2 years ago

This is an Ubuntu system with the standard deb bootstrap package that is then downloading and running the client. No flatpak cleverness or anything like that. I have not tried to tweak the number of workers - I didn't even know it was possible. It's just doing whatever it does.

philipl commented 2 years ago

Just for fun, I added 128GB of swap and it was still happy to OOM.

kakra commented 2 years ago

Maybe related to #196 which reproduces it using the pipeline cache and running fossilize manually?

philipl commented 2 years ago

Latest update. With a 6.0.x kernel, the OOM killer no longer kicks in. I still experience multiple seconds of system unresponsiveness when memory is exhausted but the fossilize processes do back off in response to pressure and nothing gets killed. I guess the kernel requirements here are exceptionally steep.