Open behlendorf opened 9 months ago
We've been investigating this. Early signs point to Go Garbage Collection and HTTP2 Health Checks. Both can be tuned.
@behlendorf which version of go are you using the build the daemons?
We're building with the RHEL 8.9 version of go.
# rpm -q golang
golang-1.20.10-1.module+el8.9.0+20382+04f7fe80.x86_64
# go version
go version go1.20.10 linux/amd64
We have found that we can reduce the CPU usage by approximately 66% by tuning garbage collection and the frequency of the HTTP2 health checks. In comparison of your original observations of 10m of CPU time over 14 days, we were able to get CPU usage down to 3m26s over 12 days (22-Dec to 02-Jan). This can be done by setting the following environment variables in the systemd unit file:
Environment=GOGC=off
Environment=GOMEMLIMIT=20MiB
Environment=GOMAXPROCS=5
Environment=HTTP2_PING_TIMEOUT_SECONDS=60
Output of systemctl stauts
with CPU accounting enabled:
[root@x9000c3s0b0n0 ~]# systemctl status clientmountd
● clientmountd.service - Data Workflow Service (DWS) Client Mount Service
Loaded: loaded (/etc/systemd/system/clientmountd.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/clientmountd.service.d
└─override.conf
Active: active (running) since Wed 2023-12-20 13:24:23 CST; 1 weeks 5 days ago
Main PID: 7840 (clientmountd)
Tasks: 10 (limit: 1646821)
Memory: 19.0M
CPU: 3min 25.808s
These environment variables have been checked into master in nnf-deploy for each daemon (i.e. nnf-dm, clientmountd). Those variables can be found here.
The current solution is to start/stop these daemons at will. Flux will do that: https://github.com/flux-framework/flux-coral2/issues/166
We've observed that the
nnf-dm
andclientmountd
daemons generate a surprising amount of system noise on the computes even when they should be idle. For reference, over the last 2 weeks they've been lightly used yet have wracked up ~10 minutes of cpu time each. This is compared to most other idle system daemons which report <5 seconds of cpu usage over the same time period. The usagennf-dm
andclientmountd
usage is similar across compute nodes.Corosync which is required for gfs2 generates even more noise on the compute nodes. One possible mitigation would be to only start/stop the pacemaker service on computes when a gfs2 filesystem has been requested. This could either be done by Flux when setting up the computes or the
clientmountd
which is already running there.