bottlerocket-os / bottlerocket

An operating system designed for hosting containers
https://bottlerocket.dev
Other
8.65k stars 510 forks source link

Unable to mount nfs persistent volume from pod running EKS bottlerockt node #4022

Open younid opened 4 months ago

younid commented 4 months ago

I have an application deployed on AWS EKS with Amazon Linux OS for the workers. This application mounts an NFS persistent volume located inside EFS storage. When I migrated my EKS cluster to Bottlerocket OS nodes I'm facing an issue with NFS mount on the workers. There's no change done on my application and not on the EFS storage. I tested the flow from EKS nodes to the EFS using mount command directly and it worked fine. The problem is more when teh kubelet is trying to mount NFS storage from the pod/container. I noticed the location for pod containers data is not located same place on usual Linux workers and in Bottlerocket OS worker. Linux nodes = /var/lib/kubelet/pods/xxxx Bottlerocket nodes = /.bottlerocket/rootfs/var/lib/kubelet/pods/xxxx

This is possible root cause ?

Image I'm using: bottlerocket-aws-k8s-1.29-x86_64-v1.19.4-4f0a078e

What I expected to happen: When we create a pod deployment using an in-tree NFS PVC it's created and the PV and PVC are created as well and bound to the pod. This is what is described inside Kubernetes documentation and what was working until we migrate to bottlerocket OS https://kubernetes.io/docs/concepts/storage/volumes/#nfs

What actually happened: The pod is created the PVC also but the pod still stuck on init container status. It's failing on NFS mount step Describe of the pod is the following:

Events: 
  Type     Reason       Age                From               Message 
  ----     ------       ----               ----               ------- 
  Normal   Scheduled    50m                default-scheduler  Successfully assigned default/my-application to ip-xxx-xxx-xxx-xxx.eu-west-1.compute.internal 
  Warning  FailedMount  49m (x8 over 50m)  kubelet            MountVolume.SetUp failed for volume "data-files" : mount failed: exit status 32 
Mounting command: mount 
Mounting arguments: -t nfs fs-xxxxxxxxxx.efs.eu-west-1.amazonaws.com:/data-files /var/lib/kubelet/pods/408b3fff-4bfc-4152-8f93-b7192bc7/volumes/kubernetes.io\~nfs/data-files 
Output: mount: /var/lib/kubelet/pods/408b3fff-4bfc-4152-8f93-b7192bc7/volumes/kubernetes.io~nfs/data-files: bad option; for several filesystems (e.g. nfs, cifs) you might need a /sbin/mount.<type> helper program. 
       dmesg(1) may have more information after failed mount system call. 

How to reproduce the problem: Create a pod deployment which mount a volume with in-tree declaration using NFS server. The deployment works fine with most of K8S nodes type but Bottlerocket OS nodes

Hereafter is the manifest I used to reproduce this behavior under AWS EKS cluster with Bottlerocket nodes

apiVersion: v1
kind: PersistentVolume
metadata:
  labels:
    type: nfs
  name: nfs-test-pv
spec:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 1Gi
  nfs:
    path: /
    server: MY-DATA-STORAGE.efs.eu-west-1.amazonaws.com
  persistentVolumeReclaimPolicy: Retain
  volumeMode: Filesystem

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  labels:
  name: nfs-test-pvc
  namespace: default
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
  selector:
    matchLabels:
      component: nfs-test-pv
  storageClassName: ""
  volumeMode: Filesystem
  volumeName: nfs-test-pv

---
apiVersion: v1
kind: Pod
metadata:
  labels:
  name: nfs-test-deployment
  namespace: default
spec:
  imagePullSecrets:
  - name: docker-registry
  containers:
  - name: nfs-test-pod
    image: busybox
    imagePullPolicy: IfNotPresent
    volumeMounts:
    - mountPath: /tmp/nfs
      name: data
  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: nfs-test-pvc
arnaldo2792 commented 4 months ago

Hey @younid, thanks for letting us know about this. This message:

bad option; for several filesystems (e.g. nfs, cifs) you might need a /sbin/mount. helper program.

Makes me thing that the default NFS provisioner attempts to use a mount helper from the host, which we don't provide. I noticed that you use EFS, have you tried the EFS CSI driver? It contains all the binaries needed to support EFS volumes out of the box. Or have you seen this guide? It explains how to create your own NFS persistent volume definition, using the CSI driver instead of the default provisioner.

younid commented 4 months ago

Hello @arnaldo2792 , Thank you for your answer and support. I tested the EFS CSI driver on new deployment of our application and it works fine. My problem is for our our already installation environments we want to migrate to Bottlerocket. Thoss environments are already deployed with in-tree NFS volumes and we can' redefine their deployment. We are looking for a solution to make those deployment working well before we start using Bootlerocket OS for all our future deployment (maybe with EFS CSI driver). But until we mnage to get legacy environments working we can't adopt Bottlerocket OS :-/

Do you think it's possible to do any thing on the workers to get them accept mount from default NFS provisioner ?

KCSesh commented 3 months ago

Hey @younid,

We are considering including nfs-utils in Bottlerocket to support this use-case, and fix the error:

bad option; for several filesystems (e.g. nfs, cifs) you might need a /sbin/mount. helper program.

If that were to happen it should help you out.

However, we don't have a proposed timeline for this change.