Change data_home default to avoid Flexvolume plugin conflict in GKE

twilfong commented 1 year ago

The Scalr agent K8s Helm chart creates a DaemonSet in the worker template that makes use of a hostPath directory. Unfortunately, the default value for this is "/home/kubernetes/flexvolume/agent-k8s", which is a directory that the GKE distribution of Kubernetes uses as its Flexvolume plugin directory.

GKE changes the default Flexvolume plugin directory from /var/lib/kubelet/volumeplugins to /home/kubernetes/flexvolume, in its Kubelet configuration. (Flexvolume is deprecated but still supported.) If this directory exists, Kubelet automatically scans it for new custom volume driver plugins, which causes (non-critical) errors to be constantly logged by the kubelet on every node in the cluster where this chart is installed.

This directory should be changed to something that no service running on the host should expect to be used for any other purpose. This is a simple one line change to do this, but a longer-term fix might be to move away from using a hostPath directly.

mermoldy commented 1 year ago

Hi @twilfong, thank you for pointing this out. We wasn't aware that using flexvolume was causing issues in kubelet services. The problem with GKE's hostPath is that it's currently the only directory with the executable permissions required for tofu/terraform to execute its plugins. Other directories on the root node disk like /home/scalr lack execution permissions, leading to issues during tf init:

│ Error: failed to read provider configuration schema for registry.terraform.io/hashicorp/null: failed to instantiate provider "registry.terraform.io/hashicorp/null" to obtain schema: fork/exec .terraform/providers/registry.terraform.io/hashicorp/null/3.2.1/linux_amd64/terraform-provider-null_v3.2.1_x5: permission denied

A more robust solution for GKE is to use external per-node data stores like https://cloud.google.com/kubernetes-engine/docs/concepts/local-ssd. The data_dir can then be mounted to /mnt/disks/ssd0/. However, this isn't something that can be included in the default chart configuration.

Using flexvolume is more of a workaround than a permanent solution, and moving away from hostPath usage is definitely a long-term goal.

gabrielrinaldi commented 1 year ago

I am having an issue related to this as well, I'm running it on EKS, but I've tried a few different paths without success. Also I agree with @mermoldy that this should not be a long term solution.

twilfong commented 1 year ago

I have tested this on our GKE clusters, by changing the value of agent.data_home to "/home/scalr/agent-k8s" in our helm chart installation, and did run into this error. As you suggest, this is a volume that is mounted with noexec, and /home/kubernetes/flexvolume is not mounted with noexec. However, in the interest of not using a directory reserved for other purposes, and also to avoid the Kubelet error, I would suggest using a different directory mounted without noexec which shouldn't cause any kubelet issues. Doing just a bit more searching, I think that "/home/kubernetes/bin/scalr/agent-k8s" should work. The only potential issue with "/home/kubernetes/bin" is that it is intended for official K8s tools, but I still think it is a better compromise than using the flexvolume directory.

twilfong commented 1 year ago

@gabrielrinaldi: We've decided to use "/home/kubernetes/bin/scalr/agent-k8s" (as in my recent commit to the PR). In GKE this is mounted on a volume without the noexec option and works for us.

$ df /home/kubernetes/bin
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/sda1       98831908 36577436  62238088  38% /home/kubernetes/bin
$ findmnt /home/kubernetes/bin
TARGET               SOURCE                          FSTYPE OPTIONS
/home/kubernetes/bin /dev/sda1[/home/kubernetes/bin] ext4   rw,nosuid,nodev,relatime,commit=30

mermoldy commented 1 year ago

@twilfong

by changing the value of agent.data_home to "/home/scalr/agent-k8s" in our helm chart installation, and did run into this error.

It occurs during the execution of the run, not just at the start of the services. The behavior may also vary depending on the root node image.

Indeed, /home/kubernetes/bin/scalr/agent-k8s is also a viable option on GKE, at least. However, it cannot be changed easily because doing so could lead to data loss, as the old directory would not be migrated during an upgrade of existing helm installations. We likely need a pre-upgrade helm hook to migrate data and ensure no loss of the old state on the node disk (I'm not sure right now how Helm will behave when changing the default value). Additionally, we need to ensure that the new path functions correctly on common cloud providers like EKS and AKS before proceeding with such a change.

There are also other things we planned to do regarding the data_home directory: using some Helm installation identifier in the directory path name to avoid collisions when installing multiple charts on the same cluster. This should be done in one change to avoid multiple migrations. I'll create an internal ticket to address all these issues at once. There's also a chance that we may prioritize moving away from hostPath sooner, rather than investing time in improving the current solution.

You can close this PR for now, or convert it to a GitHub issue and I can post the status later, if you want to track the progress. I assume you're able to change agent.data_home and it isn't a blocker for you at the moment?

twilfong commented 1 year ago

We are able to change the "agent.data_home" value without it being a problem for us, but I think the larger community using this helm chart ought to be aware of the issue that the default value causes for the kubelet. At a minimum this causes a large increase in log volume (and thus increased cost). I'll switch this to an issue instead of a PR since it is apparently not a trivial change to make. See: https://github.com/Scalr/agent-helm/issues/33

Scalr / agent-helm

Change data_home default to avoid Flexvolume plugin conflict in GKE #32