NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.97k stars 13.98k forks source link

zramswap.memoryPercent is incorrect or at least misleading #103106

Open bhansconnect opened 3 years ago

bhansconnect commented 3 years ago

Describe the bug Currently zramswap.memoryPercent is defined as:

Maximum amount of memory that can be used by the zram swap devices
(as a percentage of your total memory). Defaults to 1/2 of your total
RAM. Run <literal>zramctl</literal> to check how good memory is
compressed.

with a default value of 50 percent of the ram.

This value is then used to create the value that is passed to zramctl --size. This specifies the disk size of zram which is the limit on the uncompressed amount of data. Zram tends to have a compression ratio between 2:1 and 3:1 (generally much closer to 3:1) What this means in reality is that by specifying a memoryPercent of 50% only 15-25% of the ram will be used for zram.

Expected behavior When using memoryPercent of 50, 50% of the memory should be used by zram. If you want zram to actually be able to use half of your ram, the zramctl --size should be set to approximately 150% of the memory size. In case your data is not very compressible and you want to ensure a ram limit of exactly 50%, you can echo "50% memsize" > /sys/block/zram0/mem_limit.

I think at a minimum the definition of memoryPercent should be rewritten, but if we want things to be as easy as possible to configure, we should expose both parameters mem_limit and disksize mentioning the expected compression ratio of 3:1. Also, we should give a good default of mem_limit=50% of memory and disksize=150% of memory.

I have never done nix development before, but I could probably make a pull request for this if wanted.

stale[bot] commented 3 years ago

I marked this as stale due to inactivity. → More info

Artturin commented 1 year ago

the answer here sums it up well https://unix.stackexchange.com/a/596929

So after more testing and observations, I made a few very interesting discoveries. DATA is indeed the uncompressed amount of memory that takes up the swap space. But at first glance it's very deceiving and confusing. When you setup zram and use it as swap, disksize does not stand for the total amount of memory that zram will consume for compressed data. Instead, it stands for the total amount of uncompressed data that zram will compress. So you could create a zram device with a size of 2 GB, but in practice zram will stop after the total compressed memory is around 500 - 1000 MB (depends on your scenario of course). Commands like swapon -s or Gnome's system monitor show the uncompressed data size for the zram device, just like the DATA of zramctl. Thankfully, in reality, zram does not actually use up the reported amount of memory. But this means that in practice, you actually have to create a zram disk size that equals the RAM you have + 50% to take real advantage of it and not a disk size that equals half of the RAM size, like zram-config incorrectly does. But read on to find out more.

Here is the deeper background: Why am I so sure? Because I tested this with zswap as well. I have compiled an own kernel where I lowered the file_prio value inside mm/vmscan.c compared to anon_prio (in newer Linux 5.6 kernels the variables have been renamed to fp and ap respectively). The reduced file_prio value will make the kernel not discard valuable cache memory as much anymore. By default, even with vm.swappiness at 100, the kernel discards an insane amount of cached RAM data, both in standby memory and for active programs. The performance hit with the default configuration is extreme in memory pressure situations when you actually want to make use of zram, because then you absolutely want the kernel to swap rarely used and highly compressible memory way more often. With more free memory, you have more space for cached data. Then cached data won't be thrown away at a ridiculously high rate, and Linux won't have to reread certain purged program file cache repeatedly. When testing this on classic hard drives, you can easily verify the performance impact.

Back to my zswap test: With my custom kernel, zswap got plenty of memory to compress once I hit the 50 - 70% memory mark. Gnome's System Monitor immediately shows a high swap data usage for the page partition, but oddly enough, there was no hard drive paging at all! This is of course by design of zswap, that it swaps least recently used memory on its own. But the interesting part is that the system reports such high swap usage for the swap partiton anyway, so ultimately you are limited by the size of your swap partition or swap file. Even though all memory is compressed, you have to have at least the swap size of the uncompressed data. Therefore even if in practice 4 GB of swapped memory in zswap use up only 1 - 2 GB, your swap needs to have the size of the uncompressed data size. The same goes for zram, but here the memory is at least not actually reserved. Well, unless you use zswap with a dynamically growing swap file of course.

As for zram again, there is also a very interesting detail that backs up the observation I made:

There is little point creating a zram of greater than twice the size of memory since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the size of the disk when not in use so a huge zram is wasteful.

This means that to make an effectice use of zram, you have to at least create a disk size that equals your installed RAM. Due to the high compression ratios, I would suggest to use your GB of RAM + 50%, but the quote above implies that it does not make much sense if you go above +100%. Additionally, since we have to specify a disk size that matches the uncompressed data size, it is much harder to control and predict the actual real memory usage. From the helpful official source above, we can limit the actual memory usage (which equals the TOTAL value of zramctl) with this command: echo 1G > /sys/block/zram0/mem_limit. But in reality, doing this will lock up the machine. Because the system tries to still swap to it, but zram imposes a limit, and the machine locks up with super high CPU usage. This behavior can't be intentional at all, which strengthens my impression that something about the whole story is very wonky.

To sum this up:

  • The disksize you set during zram device creation is basically a virtual disk size, this does not stand for the real RAM usage.
  • You have to predict the actual RAM usage (compression ratio) for your scenario, or make sure that you never create a zram disk size that is too large. Your current RAM size + 50% should be nearly always fine in practice.
  • The default configuration of the Linux kernel is unfortunately totally unsuited for zram compression, even when setting vm.swappiness to 100. You need to make your own custom kernel to actually make real use of this handy feature, since Linux purges way too many file caches instead of freeing up memory by swapping the most compressible data much earlier. Ironically, a helpful patch to fix this situation was never accepted.
  • Using the zram limit echo 1G > /sys/block/zram0/mem_limit will lock up your system once the compressed data reached that threshold. You are better off to limit zram usage with a well-predicted zram disksize, as it seems there is no other alternative for a limit.

There is still Internet guides in 2022 who write stuff like "zram should be half of the RAM's size", which is untrue. The kernel.org zram doc is really insightful to understand this, as it doesn't indicate that at all.

pinging people who have previously contributed to the module @NickCao @rapenne-s @lheckemann

NickCao commented 1 year ago

So we should not actually implement the limit as it would lockup the system when the limit is hit (I doubt if zram-generator even exposes this as an option). Instead we should fix the option description to reflect its true meaning, may as well raise the default value to 100% or even 150%, right?