awslabs / mountpoint-s3-csi-driver

Built on Mountpoint for Amazon S3, the Mountpoint CSI driver presents an Amazon S3 bucket as a storage volume accessible by containers in your Kubernetes cluster.
Apache License 2.0
151 stars 18 forks source link

amazon s3 csi driver mount issue EKS cluster 1.28 #185

Closed tppalani closed 1 month ago

tppalani commented 2 months ago

/kind bug

NOTE: If this is a filesystem related bug, please take a look at the Mountpoint repo to submit a bug report

What happened?

I have deployed but pod is not coming due to mount access denied issue

$ k get pvc s3-claim
s3-claim   Bound    s3-pv    1Gi        RWX                           3m56s

$ k get pv s3-pv
s3-pv   1Gi        RWX            Retain           Bound    default/s3-claim                           4m22s

What you expected to happen?

s3-csi-node-4plmw                               3/3     Running   0          24h
s3-csi-node-64m7g                               3/3     Running   0          24h
s3-csi-node-br9kn                               3/3     Running   0          21h
s3-csi-node-dldq8                               3/3     Running   0          24h
s3-csi-node-h9hls                               3/3     Running   0          21h


    "Version" : "2012-10-17",
    "Statement" : [
        "Effect" : "Allow",
        "Principal" : {
          "Federated" : "arn:aws:iam::${data.aws_caller_identity.current.account_id}:oidc-provider/${local.eks_oidc_issuer_url}"
        "Action" : "sts:AssumeRoleWithWebIdentity",
        "Condition" : {
          "StringEquals" : { 
            "${local.eks_oidc_issuer_url}:aud": "",  
            "${local.eks_oidc_issuer_url}:sub": "system:serviceaccount:kube-system:s3-csi-*"             
  inline_policy = [{
    name = "s3-csi-mount-inline-policy"
    policy = jsonencode({
        "Version": "2012-10-17",
        "Statement": [
                "Sid": "MountpointFullBucketAccess",
                "Effect": "Allow",
                "Action": [
                "Resource": [
                "Sid": "MountpointFullObjectAccess",
                "Effect": "Allow",
                "Action": [

                "Resource": [
                    # "arn:aws:s3:::palani-test-bucket/",

How to reproduce it (as minimally and precisely as possible)?

 Warning  FailedScheduling  38s               default-scheduler  0/7 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/7 nodes are available: 7 Preemption is not helpful for scheduling..
  Normal   Nominated         37s               karpenter          Pod should schedule on: machine/default-ng8p4, node/
  Normal   Scheduled         26s               default-scheduler  Successfully assigned default/s3-app to
  Warning  FailedMount       9s (x6 over 26s)  kubelet            MountVolume.SetUp failed for volume "s3-pv" : rpc error: code = Internal desc = Could not mount "palani-test-bucket" at "/var/lib/kubelet/pods/73ed87c0-1450-4716-9a1a-619dc8edc42e/volumes/": Mount failed: Failed to start service output: Error: Failed to create S3 client  Caused by:     0: initial ListObjectsV2 failed for bucket palani-test-bucket in region us-east-2     1: Client error     2: Forbidden: Access Denied Error: Failed to create mount process

Anything else we need to know?:


arsh commented 2 months ago

This seems to be a problem where credentials aren't properly setup. Can you try the following:

  1. Ensure OIDC is enabled on the cluster. This command should produce output: aws iam list-open-id-connect-providers | grep $(aws eks describe-cluster --name $MY_CLUSTER --query "cluster.identity.oidc.issuer" --output text|sed 's/.*\///')
  2. Check your service account is annotated properly kubectl describe sa s3-csi-driver-sa -n YOUR_NAMESPACE. It should have an annotation like this: Annotations: arn:aws:iam::ACCOUNT_ID:role/s3-csi-driver-role
  3. Ensure the proper trust relationship is in that role. It should look something like this:
    {  "Version": "2012-10-17",
    "Statement": [
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::ACCOUNT_ID:oidc-provider/"
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "": "system:serviceaccount:SERVICE_ACCOUNT_NAMESPACE:SERVICE_ACCOUNT_NAME",
          "": ""

These steps are from this knowledge base which has some more details:

Also the documentation for IRSA might be helpful:

igor-golubovich commented 2 months ago

@tppalani I solved the same issue like this:

dannycjones commented 2 months ago

@tppalani I solved the same issue like this: #164 (comment)

Yes, it does look like the same issue!

It looks like the step to replace StringEquals with StringLike was missed. It should look like this:

    "StringLike": {
        "": "system:serviceaccount:kube-system:s3-csi-*",
        "": ""

I'll follow up with the folks owning the S3 User Guide to see if we can make that clearer for readers. (internal ref: d168967d-e615-4727-85fd-56028903ccd7)

dannycjones commented 1 month ago

@tppalani, does changing the StringEquals condition to StringLike solve your issue?

Let us know if you have any further issues and we can provide some more help here.

Ramneek-kalra commented 1 month ago

Hey @dannycjones !

Today I worked with another customer came with same issue i.e., this sample app - isn't working for them and throwing same error as discussed on this thread as below:

0/2 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling..

To fix this, I checked everything i.e., S3 Driver Role + OIDC Provider Mapping with Service Account, however to my surprise, issue resolved by having EFS CSI Driver Add-on as well installed to get the scheduler know that we have a CSI driver component to use StorageClass rather than using default "gp2" EBS based SC.


Post this, my application came up and I can see a file created as well on S3 Bucket. I kindly request you to review the S3 CSI Driver, in case what difference lies between this and EFS CSI Driver (why EFS CSI Driver inclusion solved this issue).

Your query:

Does changing the StringEquals condition to StringLike solve your issue?

I don't think this makes any difference, for me StringLike as well worked as smoothly as mentioned on the Doc

Happy to follow-up internally to help customers here!

passaro commented 1 month ago

Closing this issue. @tppalani, please reopen if the suggestion above did not work for you.

peterbosalliandercom commented 2 weeks ago

Hey @dannycjones !

Today I worked with another customer came with same issue i.e., this sample app - isn't working for them and throwing same error as discussed on this thread as below:

0/2 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling..

To fix this, I checked everything i.e., S3 Driver Role + OIDC Provider Mapping with Service Account, however to my surprise, issue resolved by having EFS CSI Driver Add-on as well installed to get the scheduler know that we have a CSI driver component to use StorageClass rather than using default "gp2" EBS based SC.


Post this, my application came up and I can see a file created as well on S3 Bucket. I kindly request you to review the S3 CSI Driver, in case what difference lies between this and EFS CSI Driver (why EFS CSI Driver inclusion solved this issue).

Your query:

Does changing the StringEquals condition to StringLike solve your issue?

I don't think this makes any difference, for me StringLike as well worked as smoothly as mentioned on the Doc

Happy to follow-up internally to help customers here!

What steps did you take to fix this?

Ramneek-kalra commented 2 weeks ago

Hi @peterbosalliandercom,

Thanks for the follow-up. I just added AWS EFS CSI driver Add-on additionally, nothing more than that and then deployed the application as normal.