Adjust update strategy for NNF DaemonSets

NearNodeFlash / NearNodeFlash.github.io

View this document https://nearnodeflash.github.io/

Apache License 2.0

3 stars 3 forks source link

Adjust update strategy for NNF DaemonSets #118

Closed bdevcich closed 4 months ago

bdevcich commented 5 months ago

From @behlendorf:

It took a significant amount of time for it to kill and restart all the pods when I added in the merced filesystem since it ran through it sequentially. Thankfully that's a one time thing, but it seems it will make redeploying slow.

When you add/remove a lustrefilesystem resource, the nnf-dm-manager-controller sees that and then adds/removes a Volume and VolumeMount to the nnf-dm-worker DaemonSet. Kubernetes then handles it from there and restarts the nnf-dm-worker pods on each rabbit to mount/umount that filesystem change. The DaemonSet defines this for the updateStrategy:

 updateStrategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
    type: RollingUpdate

That maxUnavailable: 1 is what is causing the sequential behavior. We'll need to tweak this.

bdevcich commented 4 months ago

We need to consider the same for lusre-csi-driver. There might be other areas for this as well.

bdevcich commented 4 months ago

@behlendorf, I'm making changes to set this to a sane default for our 3 daemonsets:

nnf-dm-worker
lustre-csi-node
nnf-node-manager

maxUnavailable can be set to a number of nodes/pods or a percentage. Setting it to 100% would cause it to attempt a restart on all the nodes at the same time, 50% would do half, 25% a quarter, etc. Do you have a preference on what percentage (or hard number) should be?

This value will be adjustable for each system.

behlendorf commented 4 months ago

@bdevcich as an initial swag how about 25%. This seems like it may be a reasonable compromise between propagating the changes rapidly and potentially overwhelming the system / container repository / other. Then we can tune on a per system basis, and revisit the default as needed.

bdevcich commented 4 months ago

@bdevcich as an initial swag how about 25%. This seems like it may be a reasonable compromise between propagating the changes rapidly and potentially overwhelming the system / container repository / other. Then we can tune on a per system basis, and revisit the default as needed.

Perfect, that's the percentage that I've been playing around with in my testing.

bdevcich commented 4 months ago

PRs here to default these all to 25%:

For nnf-dm, the NnfDataMovementManager resource is edited rather than the DaemonSet directly. The manager is responsibe for managing the DaemonSet.

bdevcich commented 4 months ago

@behlendorf I am comfortable closing this issue after we implemented this manually today on El Cap. Do you agree?

behlendorf commented 4 months ago

Yup, things are looking much better after these changes.