NearNodeFlash / NearNodeFlash.github.io

View this document https://nearnodeflash.github.io/
Apache License 2.0
5 stars 4 forks source link

nnf-dm + clientmountd cpu usage #117

Open behlendorf opened 9 months ago

behlendorf commented 9 months ago

We've observed that the nnf-dm and clientmountd daemons generate a surprising amount of system noise on the computes even when they should be idle. For reference, over the last 2 weeks they've been lightly used yet have wracked up ~10 minutes of cpu time each. This is compared to most other idle system daemons which report <5 seconds of cpu usage over the same time period. The usage nnf-dm and clientmountd usage is similar across compute nodes.

root       33954       1  0 Nov14 ?        00:09:44 /usr/bin/nnf-dm <...args...>
root       34061       1  0 Nov14 ?        00:10:11 /usr/bin/clientmountd <...args...>
                                           ^^^^^^^^^

Corosync which is required for gfs2 generates even more noise on the compute nodes. One possible mitigation would be to only start/stop the pacemaker service on computes when a gfs2 filesystem has been requested. This could either be done by Flux when setting up the computes or the clientmountd which is already running there.

bdevcich commented 9 months ago

We've been investigating this. Early signs point to Go Garbage Collection and HTTP2 Health Checks. Both can be tuned.

bdevcich commented 9 months ago

@behlendorf which version of go are you using the build the daemons?

behlendorf commented 9 months ago

We're building with the RHEL 8.9 version of go.

# rpm -q golang
golang-1.20.10-1.module+el8.9.0+20382+04f7fe80.x86_64

# go version
go version go1.20.10 linux/amd64
bdevcich commented 8 months ago

We have found that we can reduce the CPU usage by approximately 66% by tuning garbage collection and the frequency of the HTTP2 health checks. In comparison of your original observations of 10m of CPU time over 14 days, we were able to get CPU usage down to 3m26s over 12 days (22-Dec to 02-Jan). This can be done by setting the following environment variables in the systemd unit file:

Environment=GOGC=off
Environment=GOMEMLIMIT=20MiB
Environment=GOMAXPROCS=5
Environment=HTTP2_PING_TIMEOUT_SECONDS=60

Output of systemctl stauts with CPU accounting enabled:

[root@x9000c3s0b0n0 ~]# systemctl status clientmountd
● clientmountd.service - Data Workflow Service (DWS) Client Mount Service
   Loaded: loaded (/etc/systemd/system/clientmountd.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/clientmountd.service.d
           └─override.conf
   Active: active (running) since Wed 2023-12-20 13:24:23 CST; 1 weeks 5 days ago
 Main PID: 7840 (clientmountd)
    Tasks: 10 (limit: 1646821)
   Memory: 19.0M
      CPU: 3min 25.808s
bdevcich commented 8 months ago

These environment variables have been checked into master in nnf-deploy for each daemon (i.e. nnf-dm, clientmountd). Those variables can be found here.

https://github.com/NearNodeFlash/nnf-deploy/pull/105

bdevcich commented 3 months ago

The current solution is to start/stop these daemons at will. Flux will do that: https://github.com/flux-framework/flux-coral2/issues/166