ctrox / csi-s3

A Container Storage Interface for S3
Apache License 2.0
758 stars 167 forks source link

Unable to Mount volume at Pod #51

Open ghost opened 3 years ago

ghost commented 3 years ago

Hey folks,

maybe someone can give me a hint around here. For testing proposes I use minio as S3 provider, creating and attaching a PVC is working fine but I'm unable to mount the volume at a given Pod:

Normal   Scheduled               12s                default-scheduler        Successfully assigned kube-system/csi-s3-test-nginx to worker04
  Normal   SuccessfulAttachVolume  12s                attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-50edc794-e00b-4be8-8ccf-35b9b545bd4a"
  Warning  FailedMount             1s (x4 over 4s)    kubelet                  MountVolume.MountDevice failed for volume "pvc-50edc794-e00b-4be8-8ccf-35b9b545bd4a" : rpc error: code = Unknown desc = Get "http://filelake.kube-system.svc.cluster.local:7777/pvc-50edc794-e00b-4be8-8ccf-35b9b545bd4a/?location=": dial tcp: lookup filelake.kube-system.svc.cluster.local on 1.1.1.1:53: no such host

I'm aware that the error says that the host is not resolvable but the funny fact is that I'm able to reach the url "filelake.kube-system.svc.cluster.local" from every Pod on my cluster and DNS resolution seems to work as expected ...

Looking at the persistentvolumeclaim itself seems also fine to me


Name:          csi-s3-pvc
Namespace:     kube-system
StorageClass:  csi-s3
Status:        Bound
Volume:        pvc-50edc794-e00b-4be8-8ccf-35b9b545bd4a
Labels:        <none>
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: ch.ctrox.csi.s3-driver
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      5Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       csi-s3-test-nginx
Events:
  Type    Reason                 Age   From                                                                              Message
  ----    ------                 ----  ----                                                                              -------
  Normal  ExternalProvisioning   8m2s  persistentvolume-controller                                                       waiting for a volume to be created, either by external provisioner "ch.ctrox.csi.s3-driver" or manually created by system administrator
  Normal  Provisioning           8m2s  ch.ctrox.csi.s3-driver_csi-provisioner-s3-0_c3a1a4d4-44f7-4673-be0e-436df8551b6d  External provisioner is provisioning volume for claim "kube-system/csi-s3-pvc"
  Normal  ProvisioningSucceeded  8m    ch.ctrox.csi.s3-driver_csi-provisioner-s3-0_c3a1a4d4-44f7-4673-be0e-436df8551b6d  Successfully provisioned volume pvc-50edc794-e00b-4be8-8ccf-35b9b545bd4a

What could be the cause of this issue as all logs seems to be fine, a bucket also gets provisioned at minio. Everything seems to work fine except the actual mount on Pod side.

Thanks in advance :D

ghost commented 3 years ago

I just checked: sudo systemctl show --property=MountFlags docker.service

which returns no config has been set for MountFlags, could my issue be here? And if so, who to change this?

MountFlags=

ghost commented 3 years ago

I tried the following:

mkdir -p /etc/systemd/system/docker.service.d/
cat <<EOF > /etc/systemd/system/docker.service.d/mount_propagation_flags.conf
[Service]
MountFlags=shared
EOF

# systemctl daemon-reload
# systemctl restart docker.service

Seems to have no effect so far. sudo systemctl show --property=MountFlags docker.service MountFlags=shared

ctrox commented 3 years ago

I don't think the cause here is the docker MountFlags, that would result in a different error.

Even though your DNS resolution seems to work in the provisioner pod, I think the biggest indication here is that you get a DNS error in the mounter (which is running in a different Pod). Can you try to configure the endpoint with the service IP instead of the DNS name, just to see if that works? So we can really rule out the DNS issue.

ghost commented 3 years ago

Hello again, thx for your quick reply,

this is the original URL: http://filelake.kube-system.svc.cluster.local:7777

This is the IP from the K8s CIDR: http://10.43.38.101:7777

Result: Works like a charm!

But why do I get DNS resolution error just here? I would expect that I can resolve the IP behind the DNS name like any other internal K8s service.

Many thanks in advance

ctrox commented 3 years ago

Really hard to say, the driver should not be messing with DNS. Can you exec into one of the csi-s3 daemonset pods and cat /etc/resolv.conf? Also, which mounter are you using?

ghost commented 3 years ago

Same to me xD. No clue where to look at. I tried both rclone and s3fs, absolutely with the same issue and behavior. As soon as im back in the office I will look into resolv.conf

Thx so far for your support :)

TheMatrix97 commented 1 year ago

Hi! I faced the same issue, and found out a solution... I'm posting it here for future references: The problem resides in the configuration of the daemonset csi-s3.

kind: DaemonSet
apiVersion: apps/v1
metadata:
  name: csi-s3
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: csi-s3
  template:
    metadata:
      labels:
        app: csi-s3
    spec:
      hostNetwork: true
...

It seems when HostNetwork is enabled, we should include also the dnsPolicy "ClusterFirstWithHostNet". So we can access local cluster services together with external services. Although, I'm not sure why the daemonset is configured with hostNetwork=true....

So, the Daemonset definition should be:

kind: DaemonSet
apiVersion: apps/v1
metadata:
  name: csi-s3
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: csi-s3
  template:
    metadata:
      labels:
        app: csi-s3
    spec:
      hostNetwork: true
      dnsPolicy: "ClusterFirstWithHostNet"
...

More info

Check https://github.com/ctrox/csi-s3/pull/76

fallmo commented 1 year ago

I had the same problem, I looked at the logs of the csi-attacher-s3 pod, first i saw Failed to list *v1beta1.VolumeAttachment: the server could not find the requested resource. I figured it was a k8s version issue, so I updated the container image of the csi-attacher stateful set, from v2.2.1 to canary (the latest).

kubectl -n kube-system set image statefulset/csi-attacher-s3 csi-attacher=quay.io/k8scsi/csi-attacher:canary

Next I got a permission error: `v1.VolumeAttachment: failed to list *v1.VolumeAttachment: volumeattachments.storage.k8s.io is forbidden: User "system:serviceaccount:kube-system:csi-attacher-sa" cannot list resource "volumeattachments" in API group "storage.k8s.io" at the cluster scope.

I tried to modify the role bindings but I couldn't find the right combinations so I ended up giving the csi-attacher-sa service account cluster-admin privileges as shown below:

kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: csi-attacher-all
subjects:
  - kind: ServiceAccount
    name: csi-attacher-sa
    namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin