cockroachdb / docs

CockroachDB user documentation
https://cockroachlabs.com/docs
Creative Commons Attribution 4.0 International
191 stars 459 forks source link

Production Checklist: Add OS settings #4209

Open lnhsingh opened 5 years ago

lnhsingh commented 5 years ago

Lauren Singh (lnhsingh) commented:

@drewdeally: for OS settings, I will work on a doc which covers all OS setting recommendations and implications , tested and have validated by eng team. Such as the following:

echo "vm.overcommit_memory = 2" >> /etc/sysctl.conf
echo "vm.overcommit_ratio = 99" >> /etc/sysctl.conf
echo "vm.swappiness = 0" >> /etc/sysctl.conf # Or set to 0 to disable swap. Overcommit mixed with this can have swap implications
echo "vm.max_map_count=1000000000" >> /etc/sysctl.conf
echo "vm.min_free_kbytes=500000" >> /etc/sysctl.conf
echo "net.core.somaxconn=1024" >> /etc/sysctl.conf
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
echo "echo never > /sys/kernel/mm/transparent_hugepage/enabled" >> /etc/rc.d/rc.local
echo "echo never > /sys/kernel/mm/transparent_hugepage/defrag" >> /etc/rc.d/rc.local

Related to #4153.

Jira Issue: DOC-222

jseldess commented 5 years ago

@lhirata, was this part of your product checklist work, or are we waiting for details from @drewdeally?

drewdeally commented 5 years ago

Oh - please hold off I need to validate these with some content..

On Feb 28, 2019, at 12:52 PM, Jesse Seldess notifications@github.com wrote:

@lhirata https://github.com/lhirata, was this part of your product checklist work, or are we waiting for details from @drewdeally https://github.com/drewdeally?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cockroachdb/docs/issues/4209#issuecomment-468370630, or mute the thread https://github.com/notifications/unsubscribe-auth/AlqpiRC_5DRUFx_TvdQ6jg8jd8_am_YHks5vSBdJgaJpZM4ZazxR.

drewdeally commented 4 years ago

vm.overcommit_memory = 2 >> /etc/sysctl.conf will need to be reevaluated. setting to 0 seems to be safe.

jhatcher9999 commented 4 years ago

not sure if this is helpful or not, but Datastax has a lot of these same recommended settings and might have some good general descriptions that we could leverage: https://docs.datastax.com/en/dse/6.7/dse-dev/datastax_enterprise/config/configRecommendedSettings.html

drewdeally commented 4 years ago

Agreed, started a document early on https://docs.google.com/document/d/171XnKag1xkIh59MkU8g8_7abg0pQfV2O7nUfTIixssA/edit https://docs.google.com/document/d/171XnKag1xkIh59MkU8g8_7abg0pQfV2O7nUfTIixssA/edit But before adding into docs, we need to test the settings and behavior as it related to CRDB.

@Lauren and I were to pick this up last year.

On Aug 13, 2020, at 1:06 PM, jhatcher9999 notifications@github.com wrote:

not sure if this is helpful or not, but Datastax has a lot of these same recommended settings and might have some good general descriptions that we could leverage: https://docs.datastax.com/en/dse/6.7/dse-dev/datastax_enterprise/config/configRecommendedSettings.html https://docs.datastax.com/en/dse/6.7/dse-dev/datastax_enterprise/config/configRecommendedSettings.html — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/cockroachdb/docs/issues/4209#issuecomment-673596760, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJNKTCJFAWUA6Q5J7ZTB5LLSAQMQRANCNFSM4GLLHRIQ.

lauren commented 4 years ago

@drewdeally wrong Lauren.

drewdeally commented 4 years ago

Sorry -- corrected @lnhsingh

jseldess commented 4 years ago

@drewdeally, you closed this issue some time ago. Should this be re-opened? @taroface has taken over this work in the meantime.

taroface commented 4 years ago

Reopened per @fabiog1901

lancel66 commented 3 years ago

Hi, this just came up in a discussion with CRL CEAs. We need to update our live training decks with the latest recommended Linux settings, so this would be helpful. Also, there is some concern that vm_swappiness=0 does not completely disable swap and has caused issues as noted in the code comments above. It should be probably be replaced with swapoff -a + removing the swap device/file. cc @fabiog1901

drewdeally commented 3 years ago

currently check the following with customers


sudo sysctl -a | egrep 'overcommit_memory|swappiness|overcommit_ratio'
vm.overcommit_memory = 0
vm.overcommit_ratio = 50
vm.swappiness = 0

it would be great to have updated documents as a reference.

lancel66 commented 3 years ago

Here is an issue experienced with one of our customers as outlined by Jon S. J. (CEA) related to this:

overcommit_memory=2 ## Disable OOM This, combined with a default value of overcommit_ratio = 50 was resulting in an inability for the Pebble cache to allocate new memory and panic() , crashing crdb randomly. My understanding is that this out of memory situation is a risk when cockroach is started with --cache=.25 and --sql-max-memory=.25 which means that it will eventually hit the overcommit_ratio (one being represented as a % and the other as a fraction). When the process attempts to get memory after it has hit the overcommit_ratio, it will be denied but does not get killed by the system. It is left to handle the situation itself, which in crdb’s case is to exit. I don’t think overcommit_memory=2 is part of the standard documentation so we were trying to figure out where that recommendation came from and whether it should be removed from the configuration and sizing deck/workshop since it could cause some issues.

fabiog1901 commented 3 years ago

We have a line that says "Disable Linux memory swapping.". we should leave it up to the customer/sysadmin to run the necessary commands to disable swap.

As per overcommit_memory, I look forward for our engineering team response. Until then, we should not mention anything in our presentations and docs (docs in fact don't mention anything about it).

taroface commented 3 years ago

Hi @fabiog1901, should I take this to mean that we shouldn't document these settings publicly?

Per the original issue description, there other OS settings that we should be documenting?

fabiog1901 commented 3 years ago

Personally, I wouldn't - we should leave it up to the customer to know what commands to run on their OS to achieve the desired result. We might show it as an example, but not as a guide. I would like to see a guidance though for sure on the 3 settings, though, pretty much like we have it for swappiness: we don't give the commands to disable it, we just say it in plain English: "Disable Linux memory swapping.".

Ultimately, our PM/ENG for Security should make the call.

lancel66 commented 3 years ago

Hi @fabiog1901, should I take this to mean that we shouldn't document these settings publicly?

Per the original issue description, there other OS settings that we should be documenting?

I highly recommend that we give examples of what settings to apply on the platforms we support. All of the customers I have dealt with have asked for this and you can find them for any other major database platform.

taroface commented 3 years ago

@mwang1026 Could you advise on how we might proceed here?

jonstjohn commented 3 years ago

Confusion with the overcommit_memory in conjunction with an incorrectly chosen overcommit_ratio led to frequent (every 6-8 hours) cockroach node crashes for one particular customer. This somewhat obscure issue took a couple of weeks to resolve.

Although the production checklist in the official docs is great on a high-level, there are quite a few ways that a customer can get it wrong. Ideally, I think we would give some concrete examples of optimal settings for major OSes. We do a pretty good job of this with the file description limit section. Maybe it would be helpful to do something similar for memory settings.

taroface commented 3 years ago

As a compromise until we can get PM/ENG guidance on best recommended settings, could we add a callout to the Production Checklist that lists what not to do here? Are there specific combinations of overcommit_memory, overcommit_ratio, --cache, and --max-sql-memory that we can warn against using?

mwang1026 commented 3 years ago

cc @piyush-singh

piyush-singh commented 3 years ago

Discussed with the Server team today. We're going to work with our Cloud team to understand how they configure these parameters and publish guidance with those values. I think if we see other database vendors publishing guidance on these values, we should follow suit. Leaving these as an exercise for the reader to determine seems to already be causing problems, so I'd like to be explicit about what setups we consider valid/healthy. Will update here shortly.

exalate-issue-sync[bot] commented 2 years ago

Andrew Feierabend (andf-crl) commented: Redhat and clones used to name their THP pages like so:

{noformat}/sys/kernel/mm/redhat_transparent_hugepage/enabled /sys/kernel/mm/redhat_transparent_hugepage/defrag{noformat}

Not sure if this is the case with latest RHEL/clones.

Depending on sys config, THP likes to re-enable, esp when used alongside system performance tuners, like {{ktune}} or {{tuned.}}I’ve seen DB companies have users make an {{init.d service}}file or {{systemd unit}} file to auto-conf this on boot, similar to the proposed {{rc.d}} lines in this ticket’s description. Here’s MDB for example, with on-boot THP disablement, and explicit{{ktune}} and {{tuned}} modification to avoid interplay with THP.

HTH!

exalate-issue-sync[bot] commented 2 years ago

Jessie Lin (lin-crl) commented: Roachprod has implemented cgroup and set MemoryMax at 95% in https://github.com/cockroachdb/cockroach/commit/b937cc8ffe08e0da9694faab38d88fabd4da464e for over a year. All automated roachtest/benchmarks runs with this setting. Shall we update doc with this setting?

{noformat}sudo systemd-run --unit cockroach \ --same-dir --uid "$(id -u)" --gid "$(id -g)" \ --service-type=notify -p NotifyAccess=all \ -p "MemoryMax=${MEMORY_MAX}" \ -p LimitCORE=infinity \ -p "LimitNOFILE=${NUM_FILES_LIMIT}" \ bash "${0}" run{noformat}

lin-crl commented 2 years ago

From discussion w/ @bobvawter Insights shared: CC dedicated does a 90% allocation. for production clusters we may want to reserve more space for metrics collectors, package updates, etc.

exalate-issue-sync[bot] commented 2 years ago

Jessie Lin (lin-crl) commented: From discussion w/ @bobvawter Insights shared: CC dedicated sets a limit of 90% allocation. It is to reserve more space for metrics collectors, package updates, etc for production clusters. Another recommendation is to use cgroup to set limit on memory usage.

Is this something we could work together to formalize and update the docs w/ ? Currently the docs doesn’t recommend an upper limit on OS memory setting, and we have seen OOM / memory exhaustion in several customers.

exalate-issue-sync[bot] commented 10 months ago

Richard Loveland (rmloveland) commented: Kernel settings came up today in KV On-call meeting. For details see my comment here

exalate-issue-sync[bot] commented 7 months ago

Kevin Kokomani (kevinkokomani) commented: This has come up again with customer question: https://cockroachdb.zendesk.com/agent/tickets/21379

bschoening commented 7 months ago

MongoDB has similar recommendations to DataStax in their production-notes

One key issue we hit (which is discussed in the Mongo doc) is RedHat issue #6785021: Premature swapping while there is still plenty of pagecache to be reclaimed where swappiness may be ignored if vm.force_cgroup_v2_swappiness isn't correctly set.