GoogleCloudPlatform / gcs-fuse-csi-driver

The Google Cloud Storage FUSE Container Storage Interface (CSI) Plugin.
Apache License 2.0
115 stars 28 forks source link

Fail to mount the PV when using Anthos Service Mesh #40

Open ybelleguic opened 1 year ago

ybelleguic commented 1 year ago

Hello,

I'm encoutering issue when mounting a bucket as a PV with Anthos Service Mesh. Please find the following yaml at the end of the issue. It works perfectly fine when istio injection is disabled.

  Type     Reason       Age              From               Message
  ----     ------       ----             ----               -------
  Normal   Scheduled    14s              default-scheduler  Successfully assigned nginx/nginx-d576dc799-6dmvs to xxxxxxxxxxx
  Normal   Pulled       11s              kubelet            Container image "gcr.io/gke-release/asm/proxyv2:1.15.7-asm.8" already present on machine
  Normal   Created      11s              kubelet            Created container istio-init
  Normal   Started      11s              kubelet            Started container istio-init
  Normal   Pulled       10s              kubelet            Container image "gke.gcr.io/gcs-fuse-csi-driver-sidecar-mounter:v0.1.3-gke.0@sha256:854e1aa1178dc3f7e3ec5fa03cea5e32f0385ff6230efd836a22e86beb876740" already present on machine
  Normal   Created      10s              kubelet            Created container gke-gcsfuse-sidecar
  Normal   Started      9s               kubelet            Started container gke-gcsfuse-sidecar
  Warning  Failed       2s               kubelet            Error: failed to generate container "77ccfad98f48aa01e248fed7e7a444e14a348b06bc55531a158a14462c4b406e" spec: failed to generate spec: failed to stat "/var/lib/kubelet/pods/0b6ace6a-5d43-445b-8381-fb2e6da75f15/volumes/kubernetes.io~csi/gcs-fuse-csi-pv/mount": stat /var/lib/kubelet/pods/0b6ace6a-5d43-445b-8381-fb2e6da75f15/volumes/kubernetes.io~csi/gcs-fuse-csi-pv/mount: transport endpoint is not connected
  Normal   Pulled       2s               kubelet            Container image "gcr.io/gke-release/asm/proxyv2:1.15.7-asm.8" already present on machine
  Normal   Created      2s               kubelet            Created container istio-proxy
  Normal   Started      2s               kubelet            Started container istio-proxy
  Warning  Unhealthy    1s               kubelet            Readiness probe failed: Get "http://100.64.128.58:15021/healthz/ready": dial tcp 100.64.128.58:15021: connect: connection refused
  Warning  Failed       1s               kubelet            Error: failed to generate container "8c309e092fd45b084460c54349deff6d01e55bfd8b4db97e5041032dc3a10bca" spec: failed to generate spec: failed to stat "/var/lib/kubelet/pods/0b6ace6a-5d43-445b-8381-fb2e6da75f15/volumes/kubernetes.io~csi/gcs-fuse-csi-pv/mount": stat /var/lib/kubelet/pods/0b6ace6a-5d43-445b-8381-fb2e6da75f15/volumes/kubernetes.io~csi/gcs-fuse-csi-pv/mount: transport endpoint is not connected
  Normal   Pulled       0s (x3 over 9s)  kubelet            Container image "nginx:1.14.2" already present on machine
  Warning  FailedMount  0s (x2 over 1s)  kubelet            MountVolume.SetUp failed for volume "gcs-fuse-csi-pv" : rpc error: code = Internal desc = the sidecar container failed with error: mountWithArgs: failed to open connection - getConnWithRetry: get token source: DefaultTokenSource: google: could not find default credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.
gcsfuse exited with error: exit status 1
  Warning  Failed  0s  kubelet  Error: failed to generate container "5b5229a1b7ccaf54885b2dbbe34b1ec0d41e42d783934c30e49c2b7e816019eb" spec: failed to generate spec: failed to stat "/var/lib/kubelet/pods/0b6ace6a-5d43-445b-8381-fb2e6da75f15/volumes/kubernetes.io~csi/gcs-fuse-csi-pv/mount": stat /var/lib/kubelet/pods/0b6ace6a-5d43-445b-8381-fb2e6da75f15/volumes/kubernetes.io~csi/gcs-fuse-csi-pv/mount: transport endpoint is not connected

apiVersion: v1
kind: PersistentVolume
metadata:
  name: gcs-fuse-csi-pv
spec:
  accessModes:
  - ReadOnlyMany
  capacity:
    storage: 5Gi
  storageClassName: static-files-bucket
  claimRef:
    namespace: nginx
    name: gcs-fuse-csi-static-pvc
  mountOptions:
    - implicit-dirs
  csi:
    driver: gcsfuse.csi.storage.gke.io
    volumeHandle: my-bucket
    readOnly: true
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: gcs-fuse-csi-static-pvc
  namespace: nginx
spec:
  accessModes:
  - ReadOnlyMany
  resources:
    requests:
      storage: 5Gi
  volumeName: gcs-fuse-csi-pv
  storageClassName: static-files-bucket
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: nginx
  namespace: nginx
  annotations:
    iam.gke.io/gcp-service-account: nginx-gcs@{PROJECT_ID}.iam.gserviceaccount.com
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: nginx
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 2 # tells deployment to run 2 pods matching the template
  template:
    metadata:
      annotations:
        gke-gcsfuse/volumes: "true"
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80
        volumeMounts:
        - name: gcs-fuse-csi-static
          mountPath: /data
          readOnly: true
      serviceAccountName: nginx
      volumes:
      - name: gcs-fuse-csi-static
        persistentVolumeClaim:
          claimName: gcs-fuse-csi-static-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: nginx
  namespace: nginx
spec:
  ports:
  - name: http
    port: 80
  selector:
    app: nginx
---
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: googleapi
  namespace: nginx
spec:
  hosts:
  - googleapis.com
  location: MESH_EXTERNAL
  ports:
  - name: https
    number: 443
    protocol: HTTPS
  resolution: DNS
songjiaxun commented 1 year ago

Hi @ybelleguic , I could not reproduce the error on my end. The error mountWithArgs: failed to open connection - getConnWithRetry: get token source: DefaultTokenSource: google: could not find default credentials. indicates that the service account was not setup correctly. Could you double check the doc https://github.com/GoogleCloudPlatform/gcs-fuse-csi-driver/blob/main/docs/authentication.md and make sure the Workload Identity is setup correctly?

zhangluva commented 12 months ago

I have exactly the same errors from the sidecar. Does this related to the federated workload identity mentioned here? My workload identity pool has federation setup and I think Anthos probably also uses federation, that seems to be common across 3 different issues. I tried to start a container using gcr.io/google.com/cloudsdktool/cloud-sdk:latest with the same service account and verified I am able to list/upload/download from the GCS bucket. So service account/IAM/permissions are all setup correctly.

ybelleguic commented 12 months ago

Hello,

workload identity was setup correctly on my side.

my problem was related to the outboundTrafficPolicy mode set in the cluster. When the mode is set to REGISTRY_ONLY, we have to declare an istio ServiceEntry for storage.googleapis.com and add the annotation traffic.sidecar.istio.io/excludeOutboundIPRanges: "169.254.169.254/32" on the pods 1.

So I guess this issue can be closed ?

songjiaxun commented 12 months ago

Ah I see, thanks @ybelleguic for the troubleshooting step!

@zhangluva , could you follow this step and retry on your side? If it helps, please let me know, and I will update the documentation. Thank you!

zhangluva commented 11 months ago

Thanks @songjiaxun for your quick reply. I did go though the IAM and permission settings and everything looked good. Following are my steps to verify IAM/permission.

So I don't think it's an IAM permission issue. K8s service account impersonate GCP service account and then access GCS bucket all worked as expected if not using the sidecar.

Thanks,