crossplane-contrib / provider-upjet-aws

Official AWS Provider for Crossplane by Upbound.
https://marketplace.upbound.io/providers/upbound/provider-aws
Apache License 2.0
137 stars 113 forks source link

[Bug]: 1.3.0 aws-provider breaks pod identity credential resolution #1252

Closed david-kirby closed 3 months ago

david-kirby commented 3 months ago

Is there an existing issue for this?

Affected Resource(s)

Resource MRs required to reproduce the bug

In short, the below manifests:

  1. setup an aws-sqs provider that gets patched to have a service account named crossplane
  2. sets up the aws-provider config as prod and to use IRSA for credentials (IMPORTANT NOTE: This is working with a Pod Association configuration in EKS. The IAM role is not configured for IRSA and instead it's configured for Pod Identity)
  3. deploys a simple sqs queue
    
    apiVersion: sqs.aws.upbound.io/v1beta1
    kind: Queue
    metadata:
    name: my-queue
    spec:
    forProvider:
    name: my-queue
    region: us-east-1
    providerConfigRef:
    name: prod
    ---
    apiVersion: aws.upbound.io/v1beta1
    kind: ProviderConfig
    metadata:
    name: prod
    spec:
    credentials:
    source: IRSA
    ---
    apiVersion: pkg.crossplane.io/v1
    kind: Provider
    metadata:
    name: provider-aws-sqs
    spec:
    package: xpkg.upbound.io/upbound/provider-aws-sqs:v1.2.1
    runtimeConfigRef:
    name: patch-service-account
    ---
    apiVersion: pkg.crossplane.io/v1beta1
    kind: DeploymentRuntimeConfig
    metadata:
    name: patch-service-account
    spec:
    deploymentTemplate:
    spec:
      selector: {}
      template:
        spec:
          serviceAccountName: crossplane
          containers: []

### Steps to Reproduce

I had the above manifests deployed, but upon updating the sqs provider to 1.3.0 and deploying a new queue or deleting/recreating the example, the queue cannot be created.

### What happened?

Using version 1.2.1, the queue is created. After upgrading to 1.3.0, the resource does not get created. I tested this with other resources as well (sns, eks:clusterauth) and encountered the same error message whenever the provider version being used was 1.3.0

### Relevant Error Output Snippet

```shell
message: 'connect failed: cannot initialize the Terraform plugin SDK async external
        client: cannot get terraform setup: cache manager failure: cannot calculate
        the hash for the credentials file: token file name cannot be empty'


### Crossplane Version

1.15.1

### Provider Version

1.3.0

### Kubernetes Version

_No response_

### Kubernetes Distribution

_No response_

### Additional Info

_No response_
haarchri commented 3 months ago

Can you try the following:

apiVersion: aws.upbound.io/v1beta1
kind: ProviderConfig
metadata:
  name: demo-pod-identity
spec:
  credentials:
    source: WebIdentity
    webIdentity:
      roleARN: arn:aws:iam::12345678910:role/demo
      tokenConfig:
        fs:
          path: /var/run/secrets/pods.eks.amazonaws.com/serviceaccount/eks-pod-identity-token
        source: Filesystem
david-kirby commented 3 months ago

I tried the above ProviderConfig using the 1.3.0 aws-sqs provider and also updated my IAM role to allow the sts:AssumeRoleWithWebIdentity action and received this error after trying to create the sqs queue again:

    message: 'connect failed: cannot initialize the Terraform plugin SDK async external
      client: cannot get terraform setup: cache manager failure: cannot retrieve the
      AWS credentials: failed to refresh cached credentials, failed to retrieve credentials,
      operation error STS: AssumeRoleWithWebIdentity, exceeded maximum number of attempts,
      3, https response error StatusCode: 400, RequestID: 3abbae7e-25fa-4c0f-915d-929ae090c3f2,
      InvalidIdentityToken: Incorrect token audience

Double checked I have the audience set correctly on the AWS OIDC provider for the cluster

    "ClientIDList": [
        "sts.amazonaws.com"
    ],
david-kirby commented 3 months ago

Alrighty I think I got this fixed after decoding the token here: /var/run/secrets/pods.eks.amazonaws.com/serviceaccount/eks-pod-identity-token it's showing an audience of pods.eks.amazonaws.com

Once I added that to my OIDC configuration for the cluster the sqs queue was able to be created. This command aws iam get-open-id-connect-provider --open-id-connect-provider-arn <USE_ARN> should return

   "ClientIDList": [
        "sts.amazonaws.com",
        "pods.eks.amazonaws.com"
    ],

So for anyone else trying to use pod identity here's my final setup to get it working with 1.3.0

  1. Your OIDC provider must have pods.eks.amazonaws.com in the Audiences list
  2. Create an IAM role (i.e. crossplane-role) with trust policy for AssumeRoleWithWebIdentity and pods.eks.amazonaws.com
  3. Since each aws service provider you install (i.e. aws-sqs, aws-sns, etc) will have it's own unique service account name, I opted to patch it so that all of them could share the same crossplane service account. This way I create a single PodIdentity association, mapping IAM crossplane-role with crossplane service account
  4. Deploy a 1.3.0 AWS service provider
  5. Deploy a ProviderConfig with the WebIdentity configuration that @haarchri suggested
  6. Deploy the sqs queue
    apiVersion: pkg.crossplane.io/v1beta1
    kind: DeploymentRuntimeConfig
    metadata:
    name: patch-service-account
    spec:
    deploymentTemplate:
    spec:
      selector: {}
      template:
        spec:
          serviceAccountName: crossplane
          containers: []
    ---
    apiVersion: pkg.crossplane.io/v1
    kind: Provider
    metadata:
    name: provider-aws-sqs
    spec:
    package: xpkg.upbound.io/upbound/provider-aws-sqs:v1.3.0
    runtimeConfigRef:
    name: patch-service-account
    ---
    apiVersion: aws.upbound.io/v1beta1
    kind: ProviderConfig
    metadata:
    name: prod
    spec:
    credentials:
    source: WebIdentity
    webIdentity:
      roleARN: arn:aws:iam::<ACCOUNT_ID>:role/crossplane-role
      tokenConfig:
        fs:
          path: /var/run/secrets/pods.eks.amazonaws.com/serviceaccount/eks-pod-identity-token
        source: Filesystem
    ---
    apiVersion: sqs.aws.upbound.io/v1beta1
    kind: Queue
    metadata:
    name: demo-queue
    spec:
    forProvider:
    name: demo-queue
    region: us-east-1
    providerConfigRef:
    name: prod

IAM role trust policy for my crossplane-role:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::<ACCOUNT_ID>:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/UNIQUE_ID"
            },
            "Action": "sts:AssumeRoleWithWebIdentity"
        },
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "pods.eks.amazonaws.com"
            },
            "Action": [
                "sts:AssumeRole",
                "sts:TagSession"
            ]
        }
    ]
}
truongnht commented 1 month ago

@david-kirby your solution would work, but then it is not pod identity solution, rather a workaround that fixes pod identity for IRSA (since I still see the IRSA oidc).