awslabs / mountpoint-s3-csi-driver

Built on Mountpoint for Amazon S3, the Mountpoint CSI driver presents an Amazon S3 bucket as a storage volume accessible by containers in your Kubernetes cluster.
Apache License 2.0
151 stars 18 forks source link

amazon s3 csi driver mount issue EKS cluster 1.28 #185

Closed tppalani closed 1 month ago

tppalani commented 2 months ago

/kind bug

NOTE: If this is a filesystem related bug, please take a look at the Mountpoint repo to submit a bug report

What happened?

I have deployed https://github.com/awslabs/mountpoint-s3-csi-driver/blob/main/examples/kubernetes/static_provisioning/static_provisioning.yaml but pod is not coming due to mount access denied issue

$ k get pvc s3-claim
NAME       STATUS   VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
s3-claim   Bound    s3-pv    1Gi        RWX                           3m56s

$ k get pv s3-pv
NAME    CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM              STORAGECLASS   REASON   AGE
s3-pv   1Gi        RWX            Retain           Bound    default/s3-claim                           4m22s

What you expected to happen?

s3-csi-node-4plmw                               3/3     Running   0          24h
s3-csi-node-64m7g                               3/3     Running   0          24h
s3-csi-node-br9kn                               3/3     Running   0          21h
s3-csi-node-dldq8                               3/3     Running   0          24h
s3-csi-node-h9hls                               3/3     Running   0          21h

policy

    "Version" : "2012-10-17",
    "Statement" : [
      {
        "Effect" : "Allow",
        "Principal" : {
          "Federated" : "arn:aws:iam::${data.aws_caller_identity.current.account_id}:oidc-provider/${local.eks_oidc_issuer_url}"
        },
        "Action" : "sts:AssumeRoleWithWebIdentity",
        "Condition" : {
          "StringEquals" : { 
            "${local.eks_oidc_issuer_url}:aud": "sts.amazonaws.com",  
            "${local.eks_oidc_issuer_url}:sub": "system:serviceaccount:kube-system:s3-csi-*"             
          }
        }
      }
    ]
  })
  inline_policy = [{
    name = "s3-csi-mount-inline-policy"
    policy = jsonencode({
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "MountpointFullBucketAccess",
                "Effect": "Allow",
                "Action": [
                    "s3:ListBucket"
                ],
                "Resource": [
                    "arn:aws:s3:::palani-test-bucket"
                ]
            },
            {
                "Sid": "MountpointFullObjectAccess",
                "Effect": "Allow",
                "Action": [
                    "s3:GetObject",
                    "s3:ListObject",
                    "s3:PutObject",
                    "s3:AbortMultipartUpload",
                    "s3:DeleteObject",
                    "kms:Encrypt",
                    "kms:Decrypt",
                    "kms:ReEncrypt*",
                    "kms:GenerateDataKey*",
                    "kms:DescribeKey",
                    "kms:CreateGrant",
                    "kms:ListGrants",
                    "kms:RevokeGrant"

                ],
                "Resource": [
                    # "arn:aws:s3:::palani-test-bucket/",
                    "arn:aws:s3:::palani-test-bucket/*"
                ]
            }
        ]
    })

How to reproduce it (as minimally and precisely as possible)?

 Warning  FailedScheduling  38s               default-scheduler  0/7 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/7 nodes are available: 7 Preemption is not helpful for scheduling..
  Normal   Nominated         37s               karpenter          Pod should schedule on: machine/default-ng8p4, node/ip-10-2-3-4.us-east-2.compute.internal
  Normal   Scheduled         26s               default-scheduler  Successfully assigned default/s3-app to ip-10-1-2-3.us-east-2.compute.internal
  Warning  FailedMount       9s (x6 over 26s)  kubelet            MountVolume.SetUp failed for volume "s3-pv" : rpc error: code = Internal desc = Could not mount "palani-test-bucket" at "/var/lib/kubelet/pods/73ed87c0-1450-4716-9a1a-619dc8edc42e/volumes/kubernetes.io~csi/s3-pv/mount": Mount failed: Failed to start service output: Error: Failed to create S3 client  Caused by:     0: initial ListObjectsV2 failed for bucket palani-test-bucket in region us-east-2     1: Client error     2: Forbidden: Access Denied Error: Failed to create mount process

Anything else we need to know?:

Environment

arsh commented 2 months ago

This seems to be a problem where credentials aren't properly setup. Can you try the following:

  1. Ensure OIDC is enabled on the cluster. This command should produce output: aws iam list-open-id-connect-providers | grep $(aws eks describe-cluster --name $MY_CLUSTER --query "cluster.identity.oidc.issuer" --output text|sed 's/.*\///')
  2. Check your service account is annotated properly kubectl describe sa s3-csi-driver-sa -n YOUR_NAMESPACE. It should have an annotation like this: Annotations: eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT_ID:role/s3-csi-driver-role
  3. Ensure the proper trust relationship is in that role. It should look something like this:
    {  "Version": "2012-10-17",
    "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::ACCOUNT_ID:oidc-provider/oidc.eks.AWS_REGION.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.AWS_REGION.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E:sub": "system:serviceaccount:SERVICE_ACCOUNT_NAMESPACE:SERVICE_ACCOUNT_NAME",
          "oidc.eks.AWS_REGION.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E:aud": "sts.amazonaws.com"
        }
      }
    }
    ]
    }

These steps are from this knowledge base which has some more details: https://repost.aws/knowledge-center/eks-troubleshoot-oidc-and-irsa.

Also the documentation for IRSA might be helpful: https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html

igor-golubovich commented 2 months ago

@tppalani I solved the same issue like this: https://github.com/awslabs/mountpoint-s3-csi-driver/issues/164#issuecomment-2072141519

dannycjones commented 2 months ago

@tppalani I solved the same issue like this: #164 (comment)

Yes, it does look like the same issue!

It looks like the step to replace StringEquals with StringLike was missed. It should look like this:

{
    "StringLike": {
        "oidc.eks.AWS_REGION.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E:sub": "system:serviceaccount:kube-system:s3-csi-*",
        "oidc.eks.AWS_REGION.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E:aud": "sts.amazonaws.com"
    }
}

I'll follow up with the folks owning the S3 User Guide to see if we can make that clearer for readers. (internal ref: d168967d-e615-4727-85fd-56028903ccd7)

dannycjones commented 1 month ago

@tppalani, does changing the StringEquals condition to StringLike solve your issue?

Let us know if you have any further issues and we can provide some more help here.

Ramneek-kalra commented 1 month ago

Hey @dannycjones !

Today I worked with another customer came with same issue i.e., this sample app - https://github.com/awslabs/mountpoint-s3-csi-driver/blob/main/examples/kubernetes/static_provisioning/static_provisioning.yaml isn't working for them and throwing same error as discussed on this thread as below:

0/2 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling..

To fix this, I checked everything i.e., S3 Driver Role + OIDC Provider Mapping with Service Account, however to my surprise, issue resolved by having EFS CSI Driver Add-on as well installed to get the scheduler know that we have a CSI driver component to use StorageClass rather than using default "gp2" EBS based SC.

Ask:

Post this, my application came up and I can see a file created as well on S3 Bucket. I kindly request you to review the S3 CSI Driver, in case what difference lies between this and EFS CSI Driver (why EFS CSI Driver inclusion solved this issue).

Your query:

Does changing the StringEquals condition to StringLike solve your issue?

I don't think this makes any difference, for me StringLike as well worked as smoothly as mentioned on the Doc

Happy to follow-up internally to help customers here!

passaro commented 1 month ago

Closing this issue. @tppalani, please reopen if the suggestion above did not work for you.

peterbosalliandercom commented 2 weeks ago

Hey @dannycjones !

Today I worked with another customer came with same issue i.e., this sample app - https://github.com/awslabs/mountpoint-s3-csi-driver/blob/main/examples/kubernetes/static_provisioning/static_provisioning.yaml isn't working for them and throwing same error as discussed on this thread as below:

0/2 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling..

To fix this, I checked everything i.e., S3 Driver Role + OIDC Provider Mapping with Service Account, however to my surprise, issue resolved by having EFS CSI Driver Add-on as well installed to get the scheduler know that we have a CSI driver component to use StorageClass rather than using default "gp2" EBS based SC.

Ask:

Post this, my application came up and I can see a file created as well on S3 Bucket. I kindly request you to review the S3 CSI Driver, in case what difference lies between this and EFS CSI Driver (why EFS CSI Driver inclusion solved this issue).

Your query:

Does changing the StringEquals condition to StringLike solve your issue?

I don't think this makes any difference, for me StringLike as well worked as smoothly as mentioned on the Doc

Happy to follow-up internally to help customers here!

What steps did you take to fix this?

Ramneek-kalra commented 2 weeks ago

Hi @peterbosalliandercom,

Thanks for the follow-up. I just added AWS EFS CSI driver Add-on additionally, nothing more than that and then deployed the application as normal.