Using KeyVault FlexVolume with Autoscaler

Shaked commented 4 years ago

Describe the request

Hey, I'm currently using GKE and AKS (different purposes). When using GKE, I have set autoscaler to run a minimum 0 GPU nodes in order to save money.

The problem is that once a GPU node is up, the keyvault-flexvol installation is not available.

If there's a way to let the autoscaler know that it needs to install keyvault-flexvol (or any other .yaml) that would solve the problem.

Explain why Key Vault FlexVolume needs it

It would allow developers to use KeyVault together with autoscaler in order to share secrets.

For example, this would be very powerful when using kubeflow or azureml.

Describe the solution you'd like

Not sure if possible, but provision node when it goes up same as nvidia does with its drivers for GPU nodes

Describe alternatives you've considered

Additional context

berndverst commented 4 years ago

What you are asking is exactly what a DaemonSet should do and the KeyVault FlexVolume installation configures a DaemonSet.

Does it work fine for CPU instances provisioned by auto scaling? Maybe the issue is just with the DaemonSet node selector. It should work fine as is on AKS.

Are you just having trouble on GKE? If you checked the labels on the GPU instances do they match the node selector? Make sure to also use the GKE specific FlexVolume mount path.

https://github.com/Azure/kubernetes-keyvault-flexvol/blob/master/deployment/kv-flexvol-installer.yaml

Shaked commented 4 years ago

Does it work fine for CPU instances provisioned by auto scaling? Maybe the issue is just with the DaemonSet node selector. It should work fine as is on AKS.

I'm seeing another error, I think it's related to permissions though. Will report back

$ tail -n 10 /var/log/kv-driver.log
Thu Mar 12 21:55:57 UTC 2020 ismounted | not mounted
Thu Mar 12 21:55:57 UTC 2020 ERROR: {"status": "Failure", "message": "validation failed, tenantid is empty"}

Although tenantid is there (I copy-pasted your example with my own values).

Are you just having trouble on GKE? If you checked the labels on the GPU instances do they match the node selector? Make sure to also use the GKE specific FlexVolume mount path.

Labels are matching and I have edited the .yaml file accordingly.

Currently I'm not seeing it on the gpu server and I think that it won't work either way because the secretRef is not set automatically.

EDIT:

I have logged in to the GKE GPU node and check if flexvolume exists:

user@gke-deployments-nap-n1-highmem-4-gpu1-XYZ-XYZ ~ $ ls -la /home/kubernetes/flexvolume/
total 8
drwxr-xr-x  2 root root 4096 Mar 12 20:06 .
drwxr-xr-x 10 root root 4096 Mar 12 20:06 ..

This of course shows that there's no flexvolume installed

Shaked commented 4 years ago

Hey @berndverst,

I have been testing one of those two issues - the CPU problem with the missing tenantId.

I have finally figured what is the root cause of this! While running manually the kv binary on the node, I see the following error:

/home/kubernetes/flexvolume/azure~kv $ sudo ./kv mount /tmp/test1 '{"usepodidentity": "false","resourcegroup": "resourcegroup-name","keyvaultname": "vault-name","keyvaultobjectnames": "secret-name","keyvaultobjectaliases": "secret.json","keyvaultobjecttypes": "secret","subscriptionid": "<subscriptionid>","tenantid": "<tenantid>"}'

./kv: line 41: /usr/bin/jq: No such file or directory
./kv: line 42: /usr/bin/jq: No such file or directory
./kv: line 44: /usr/bin/jq: No such file or directory
./kv: line 45: /usr/bin/jq: No such file or directory
./kv: line 48: /usr/bin/jq: No such file or directory
./kv: line 49: /usr/bin/jq: No such file or directory
./kv: line 50: /usr/bin/jq: No such file or directory
./kv: line 51: /usr/bin/jq: No such file or directory
./kv: line 53: /usr/bin/jq: No such file or directory
./kv: line 54: /usr/bin/jq: No such file or directory
./kv: line 55: /usr/bin/jq: No such file or directory
./kv: line 58: /usr/bin/jq: No such file or directory
./kv: line 59: /usr/bin/jq: No such file or directory
./kv: line 60: /usr/bin/jq: No such file or directory
./kv: line 64: /usr/bin/jq: No such file or directory
./kv: line 65: /usr/bin/jq: No such file or directory
./kv: line 66: /usr/bin/jq: No such file or directory
{"status": "Failure", "message": "validation failed, tenantid is empty"}

As this is a GKE node, I cannot install jq.

Any ideas how this can be fixed?

EDIT:

The GKE node uses Chromium OS, which AFAI can tell, it's not possible to install jq over there, but I might be wrong on that one.

My quick idea to solve this, would be to add a small python script:

#!/usr/bin/python
import sys, json

data = json.load(sys.stdin);
findkey = sys.argv[1]
if findkey in data:
    print(data[findkey])

This will allow to ensure that once {"keyvaultname": "keyvaultname","keyvaultobjectaliases": "keyvaultobjectaliases","keyvaultobjectnames": "keyvaultobjectnames","keyvaultobjecttypes": "keyvaultobjecttypes","kubernetes.io/fsType": kubernetes.io/fsType","kubernetes.io/pod.name": "kubernetes.io/pod.name","kubernetes.io/pod.namespace": "kubernetes.io/pod.namespace","kubernetes.io/pod.uid": "kubernetes.io/pod.uid","kubernetes.io/pvOrVolumeName": "kubernetes.io/pvOrVolumeName","kubernetes.io/readwrite": "kubernetes.io/readwrite","kubernetes.io/secret/clientid": kubernetes.io/secret/clientid","kubernetes.io/secret/clientsecret": "kubernetes.io/secret/clientsecret","kubernetes.io/serviceAccount.name": "kubernetes.io/serviceAccount.name","tenantid": "tenantid"} is passed and jq doesn't exist, then the script will take place.

Example:

$ echo '{ "test": "value", "something": "azure"}' |  ./jq-replacement.py something
azure
$ echo '{ "test": "value", "something": "azure"}' |  ./jq-replacement.py something1

Then, in the kv binary I'd add a check:

USE_JQ=TRUE
if [ ! -f $JQ ]; then
    USE_JQ=FALSE
fi
...
mount() {
 if USE_JQ ...
    //mount with the current implementation
  else 
    //mount with the `kv_json_parser.py` implementaiton

If there's no plan to use a multi (or more) dimensional json, then it might be worth to consider removing jq support. However, it is worth thinking about it, as python versions also might become an issue.

What do you think? I can make a PR if this solution seems fair enough.

Shaked commented 4 years ago

Hey @berndverst

The problem is that once a GPU node is up, the keyvault-flexvol installation is not available.

I have finally found the problem.

The GPU nodes contain a taint:

kubectl describe node gke-node | grep -i 'nosche'
Taints:             nvidia.com/gpu=present:NoSchedule

Node affinity, described here, is a property of pods that attracts them to a set of nodes (either as a preference or a hard requirement). Taints are the opposite – they allow a node to repel a set of pods.

Taints and tolerations work together to ensure that pods are not scheduled onto inappropriate nodes. One or more taints are applied to a node; this marks that the node should not accept any pods that do not tolerate the taints. Tolerations are applied to pods, and allow (but do not require) the pods to schedule onto nodes with matching taints.

https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/

This means that in order for keyvault-flexvol to work, one will have to update the kv-flex-installer.yaml tolerations:

      tolerations:
        - key: nvidia.com/gpu
          value: present
          effect: NoSchedule

I will add this to the PR as comment.

berndverst commented 4 years ago

Hi @ritazh , hope you are well :)

Can you look at this issue or PR #186 for removing the reliance on jq at least outside of AKS? There appear to be some folks wanting to use KeyVault from GKE and jq isn't available there.

Also, it would be good to provide some instructions for adding the toleration to the daemonset spec in case someone wants to run the Flex Volume driver on GPU nodes as well. Alternatively, provide an installer yaml for CPU instances and one for CPU + Nvidia GPU instances. I'm not sure that a comment in the yaml alone is obvious enough.

Shaked commented 4 years ago

@berndverst

Also, it would be good to provide some instructions for adding the toleration to the daemonset spec in case someone wants to run the Flex Volume driver on GPU nodes as well. Alternatively, provide an installer yaml for CPU instances and one for CPU + Nvidia GPU instances. I'm not sure that a comment in the yaml alone is obvious enough.

Honestly, I think that by default the installation should work on all types of nodes i.e uncomment toleration part and instead add a comment why it's there

berndverst commented 4 years ago

@Shaked I suppose it doesn't really matter anymore. I just learned that mounting KeyVault will be done differently in the future. Kubernetes moves fast as you know! Check this out: https://github.com/Azure/secrets-store-csi-driver-provider-azure

You'll need to use the Service Principal option if you are not running on AKS / AKS Engine clusters in Azure.

Shaked commented 4 years ago

@berndverst

I will definitely check it out, thank you!

I still think that it will take time to migrate from the current implementation of keyvault-flexvol to the new one, and easier to upgrade its version than migrating and putting our resources on it.

Therefore, I'd still be happy if this PR will be accepted.

Do you know if the new implementation will allow to import secrets from AKV directly to ENV vars instead of mounting to files?

berndverst commented 4 years ago

@ritazh and team is whom you have to convince :)

ritazh commented 4 years ago

@Shaked sorry for the delay and thank you for the PR. will review and test the PR soon!

Azure / kubernetes-keyvault-flexvol

Using KeyVault FlexVolume with Autoscaler #183

Describe alternatives you've considered

Additional context