ceph / ceph-ansible

Ansible playbooks to deploy Ceph, the distributed filesystem.
Apache License 2.0
1.68k stars 1.01k forks source link

persistent disabling of transparent hugepages #1013

Closed bengland2 closed 5 years ago

bengland2 commented 8 years ago

in ceph-ansible kernel tuning yml it disables transparent hugepages by echoing to sysfs but this is not a persistent setting across reboots. Note that disable_transparent_hugepage var is True by default. Yaml to disable THP on the live system is here.

See this article about the problems with persisting this setting across reboots. Since it's in sysfs under /sys/kernel/mm/transparent_hugepage/enabled and not /proc/sys/ , you can't add to sysctl.conf, right? A tuned profile for Ceph would solve this for Fedora, Centos and RHEL but this would not work for Ubuntu would it? It could also be implemented as a boot-time systemctl unit file (my favorite since it is portable across distros now and can be set early in boot cycle before Ceph runs).

Here is kernel article that exhaustively discusses THP including pros and cons.

leseb commented 7 years ago

I won't say no to a tuned profile for Red Hat based systems Ben. I agree this Kernel tuning should survive reboots.

Do you mind sending a PR for this?

bengland2 commented 7 years ago

Perhaps this could be a 1-line add to an existing unit file installed by Ceph ansible, but we should only disable THP where it's needed, for OSD hosts that thrash memory.

bengland2 commented 7 years ago

best I can do so far is in templates/ceph-osd.service.j2 add this:

{% if disable_transparent_hugepage -%}
ExecStartPre=-/usr/lib/ceph/ceph-disable-thp.sh
{% endif -%}

where this script contains:

#!/bin/bash
echo never > /sys/kernel/mm/transparent_hugepage/enabled
logger "disabled transparent huge pages"

ok?

leseb commented 7 years ago

@bengland2 templates/ceph-osd.service.j2 is only for container based deployments so this won't work for traditionnal non-containerized deployments.

bengland2 commented 6 years ago

turns out that the tuned package is available in some form on Ubuntu, not just RHEL/Centos/Fedora. So we could use a tuned profile to solve this problem. For example, on my F26 laptop you can install tuned-utils RPM and see profile /usr/lib/tuned/network-latency/tuned.conf , which says that for the network-latency tuned profile it turns off transparent_hugepages. We could put together a Ceph tuned profile that did this, as well as other things currently done by ceph-ansible. Would this be an acceptable solution?

bengland2 commented 6 years ago

I verified that in OpenStack, transparent hugepages are not permanently disabled, because of this problem. However, this solves it for a RHEL7 host:

[root@overcloud-cephstorage-0 ceph-osd]# tuned-adm active
Current active profile: throughput-performance

[root@overcloud-cephstorage-0 ceph-osd]# cat /sys/kernel/mm/transparent_hugepage/enabled 
[always] madvise never

[root@overcloud-cephstorage-0 ceph-osd]# tuned-adm profile ceph-osd

[root@overcloud-cephstorage-0 ceph-osd]# tuned-adm active
Current active profile: ceph-osd

[root@overcloud-cephstorage-0 ceph-osd]# cat /sys/kernel/mm/transparent_hugepage/enabled 
always madvise [never]

[root@overcloud-cephstorage-0 ceph-osd]# pwd
/usr/lib/tuned/ceph-osd

[root@overcloud-cephstorage-0 ceph-osd]# cat tuned.conf

[main]
# this has some of same things as latency-performance,
# without disabling power management
summary=focused on low latency ceph I/O
...
[vm]
transparent_hugepages=never
leseb commented 6 years ago

@bengland2 sounds like a valid solution to me for non-containerized cluster only I'm afraid. Do you want to submit a PR for this?

bengland2 commented 6 years ago

@leseb , if we run tuned-adm profile ceph-osd on the host, then all the containers on the host inherit that change. I verified this on an OSP12 system just now - did docker exec to connect to a container, then asked it what /sys/kernel/mm/transparent_hugepage/enabled value was. When I enable/disable it on the host (outside the container) by echoing to /sys/kernel/mm/transparent_hugepage/enabled, the value changes inside the container too. So the OSD role can set a tuned profile on the OSD hosts for Ceph that included disabling transparent hugepages, and this would work for both bare-metal and containerized Ceph as long as the tuned RPM was present and the tuned profile was installed on the host. Agreed?

Since the tuned profile we need is not part of RHEL/centos, should it be part of ceph-ansible? ceph-ansible could copy it or softlink it to /usr/lib/tuned/ where the other profiles are and then ensure that the tuned RPM was installed, at that point it should work on RHEL/centos.

leseb commented 6 years ago

Agree, when I said this is a valid solution for non-containerized cluster only I was assuming that tuned-adm won't be present which might be the case on Atomic for example.

bengland2 commented 6 years ago

I'm not aware that Atomic is a requirement for ceph-ansible, am investigating. Tim Wilkinson is going to try to run some tests with this tuned profile. The other concern here is HCI, where we are turning off THP not only for Ceph but also for guests. This is not ideal - some application workloads benefit from THP. But if that's true, then why does ceph-ansible turn off THP today? Still I think it's a good default, and tuned profiles can be overridden where necessary. @fultonj any comments?

bengland2 commented 6 years ago

At this point, my suggestion would be to remove ceph-ansible tuning of THP since it is not persistent anyway. If we're ever going to do THP tuning, do it with a tuned profile that persists across reboots. Same for other kernel tuning params. We may want to disable KSM (kernel same-page merging) as well, again something that sysctl doesn't allow.

Tim Wilkinson ran tests on RHOSP HCI with tuned profiles that did not show conclusive gains, methodology needs to be improved to ensure that reboots are done between runs with different tuned profiles. I think we also need to incorporate an all-flash configuration to see maximum impact this may have.

This might be a reason NOT to disable transparent huge pages - huge pages used by bluestore to lower TLB misses. Radoslaw Zarzynski is making buffer allocation size same as huge page size. @rzarzynski would THP disabling hurt your change? You did see a performance boost with this, correct? (I heard you talking about it in Ceph upstream perf. mtg.) In what configuration?

leseb commented 6 years ago

Disabling THP actually persists since https://github.com/ceph/ceph-ansible/commit/334d4cb885616ede72ceca6fcad95662040e1640. We can add a condition to not do that when objectsore is bluestore.

bengland2 commented 6 years ago

This seems like the right solution to me. Transparent hugepages can benefit some application workloads, as Perf & Scale team has measured. So for hyperconverged storage with Bluestore it would be great if we didn't have to disable THP. Once this solution has been verified, we should close this issue.

guits commented 5 years ago

Closed due to inactivity, feel free to re-open if needed. Thanks!