basho / basho_docs

Basho Products Documentation
http://docs.basho.com
Other
168 stars 191 forks source link

Invalid / confusing sysctl settings are suggested #2519

Open jaroslawr opened 6 years ago

jaroslawr commented 6 years ago

At http://docs.basho.com/riak/kv/2.2.3/using/performance/#optional-i-o-settings, section "Optional I/O Settings", the following sysctl settings are suggested as a possible optimization for specific workloads:

vm.dirty_background_ratio = 0
vm.dirty_background_bytes = 209715200
vm.dirty_ratio = 40
vm.dirty_bytes = 0
vm.dirty_writeback_centisecs = 100
vm.dirty_expire_centisecs = 200

This is misleading, as per https://www.kernel.org/doc/Documentation/sysctl/vm.txt, the dirty_background_ratio and dirty_ratio pair of settings is mutually exclusive with dirt_background_bytes and dirty_bytes pair, with a setting from one pair overwriting the corresponding setting from the other pair:

Note: dirty_bytes is the counterpart of dirty_ratio. Only one of them may be specified at a time. When one sysctl is written it is immediately taken into account to evaluate the dirty memory limits and the other appears as 0 when read.

Additionally, according to the same doc, dirty_bytes can not be set to zero:

Note: the minimum value allowed for dirty_bytes is two pages (in bytes); any value lower than this limit will be ignored and the old configuration will be retained.

The settings seem in the end to effectively amount to just:

vm.dirty_background_bytes = 209715200
vm.dirty_writeback_centisecs = 100
vm.dirty_expire_centisecs = 200

Actually applying this seems to cause a lot of I/O pauses, slowing down the CPU. Perhaps what those settings do should be explained a bit more, or maybe this recommendation should not be made at all as it is very specific to a given use case?

Reason I found this: I am not even using Riak, but someone has mindlessly applied those settings to one of the servers we are running. As a result the servers, which in addition to running (another) database, were running some work requiring some CPU, were experiencing what looked like random slowdowns. Obviously the docs are not to blame for this rather stupid situation, however I do think this section has more potential to be confusing than to be helpful.