Is there a way to avoid unresponsive system due to high cache pressure on Linux?

Alexey104 commented 2 years ago

When seeding 24/7 on the main production Linux machine, the system becomes quite laggy and unresponsive with time. In htop I can see huge amount of system cache/buffers being used when torrent client is working, that leads other useful caches to be constantly evicted by the torrents cache. If swap partition is mounted, it is always filled up to 100% eventually regardless of its size. I 've tried to set disk_io_read_mode to disable_os_cache, but this setting seems to be deprecated and completely ignored. I've also tried to play with some sysctl parameters such as vfs_cache_pressure, but it doesn't show any meaningful improvements, so I just let the system to use its defaults. What seems to really help is the limiting the amount of RAM available for torrent client with cgroups. It makes the system lag less, but it doesn't solve the problem completely, just slows down the negative effects.

Is there anything else I can try to mitigate the aggressiveness of the torrents cache pressure on Linux?

arvidn commented 2 years ago

I take it you tried setting vm.vfs_cache_pressure ~~to 200, which I believe is the highest setting~~ to a low value. With cgroups, are you're saying the process eventually ends up using more memory than the limit?

A few libtorrent settings you could try:

close_file_interval; on linux this defaults to be disabled. If you set this to a non-zero value one of the open files will be closed (and possibly re-opened if there's still demand to read from or write to it). This can help nudge the kernel into evicting the cache for that file.

file_pool_size; This is the number of file handles (and file mappings) to keep open at any given time. With a lower value, fewer files will be mapped at a time, possibly causing the kernel to keep less in its cache. However, if all of your files are larger than the physical amount of RAM, it might not make any difference.

disk_io_read_mode is not deprecated, and setting it to disabe_os_cache will cause libtorrent to hint that the cache is no longer needed immediately after reading it using madvice(MADV_COLD) (if available) and msync(MS_INVALIDATE) (although the latter probably doesn't do anything).

Please report back your findings!

arvidn commented 2 years ago

it's probably worth trying a lower value /proc/sys/vm/swappiness. If I understand correcly, this controls the priority between anonymous pages and buffer cache pages. You want to prioritize anonymous memory.

arvidn commented 2 years ago

I found this interesting: https://rudd-o.com/linux-and-free-software/tales-from-responsivenessland-why-linux-feels-slow-and-how-to-fix-that

Alexey104 commented 2 years ago

Thank you, Arvid, I will try your suggestions.

With cgroups, are you're saying the process eventually ends up using more memory than the limit?

Well, I thought that with cgroups I will be able to limit both "dirty" memory and caches to the size defined in memory.high/memory.max . But it seems that those settings only limit the maximum amount of dirty memory used by a process still letting the system use as much memory as it wants for buffers/cache: caching And this is the problem - as you can see on the screenshot above, Deluge is only using ~700MB of RAM, 500MB of which is the resident cache(I use RC_1_2). I think this is a normal memory usage when seeding several hundreds of torrents, so no problems with that. But the remaining part of the memory is filled up with buffers/cache even though no other significant processes are running and memory.high for Deluge is set to 2Gigs with cgroups(the same is with memory.max). There is no more space left for other processes' caches, that makes the system slow and jerky. However, cgroups still seems to somehow help, because the speed and aggressiveness of swapping is much lower compared to the case when cgroups is not used(swapping is still 100% after a while with vm.swappiness set to 10 though).

So, different sysctl/cgroups tweaks might slow down memory pressure, but they don't help to avoid the issue completely. I understand that this is not actually a Libtorrent issue, this is just how the operating system works. Just looking for ways to trick the system. close_file_interval/file_pool_size/disk_io_read_mode settings look interesting, I will try to tweak them. There will be no electricity in my house for two days, but I will report back in a few days later.

Alexey104 commented 2 years ago

I think I managed to achieve significant improvements with close_file_interval = 60 and disk_io_read_mode = disable_os_cache in combination with vm.vfs_cache_pressure = 50 and 2 Gigs memory limit for torrent client set with cgroups. It really feels much better now, even though htop shows no difference - still 100% swap usage after several hours of seeding and all available memory is polluted with the torrents cache. It seems that caching is a low level mechanism hardcoded into the kernel itself, and you cannot just force the OS not to cache the things that you don't want to be cached.

Alexey104 commented 2 years ago

Ok, this issue rather seems to be related to the OS memory management facilities than to Libtorrent itself. Thus, I don't see reasons to keep this opened.

Alexey104 commented 2 years ago

Just want to report that after messing with a lot of system and Libtorrent settings I found the best solution for me to be disabling swap completely. Forever. This does not prevent high cache pressure, but at least application resident data always sits in RAM and never swaps out to disk in favor of torrent cache. Some non-significant freezes can still be observed when you switch to an application that has been idling at the background for many hours, but those freezes are really minor.

I personally don't consider swap to be a bad thing and had always have 500MB swap partition. But torrenting seems to be a special case, where swap can be undesirable.

arvidn / libtorrent

Is there a way to avoid unresponsive system due to high cache pressure on Linux? #6937