cert-manager / cert-manager

Automatically provision and manage TLS certificates in Kubernetes
https://cert-manager.io
Apache License 2.0
12k stars 2.07k forks source link

Make Route53 dns01 work with IAM roles for service accounts #2147

Closed hendrikhalkow closed 4 years ago

hendrikhalkow commented 5 years ago

cert-manager should support the new IAM Roles for Service Accounts (IRSA) feature of AWS. Instead of putting the assumed role into the ClusterIssuer, it should go into the service account.

If you set everything up as described in the linked AWS page and put the assumed role in, just put the web identity role into v1.10.1, you get a lot of errors:

error instantiating route53 challenge solver: unable to assume role: AccessDenied: Access denied\n\tstatus code: 403
munnerz commented 5 years ago

Hey, thanks for the feedback. I'm not personally too familiar with the AWS IAM APIs,

2083 upgraded our use of AWS client libraries to support IAM Roles for Service Accounts, as far as I am aware, and #1917 is where support for an explicit role field was added.

Would you be able to distill some of the information in those linked AWS docs for us, so we can work out what/if we need to make further changes? 😄

hendrikhalkow commented 5 years ago

Hi @munnerz, of course. The whole thing works like this:

In addition to your AWS EKS Kubernetes cluster, you run an OpenID Connect identity provider for your AWS account, associated with your cluster. Because Kubernetes service accounts are first-class citizens in IAM now, you can allow them to assume an IAM role by creating a role with a trust relationship like this (identifiers redacted):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Federated":
          "arn:aws:iam::0123456789012:oidc-provider/oidc.eks.eu-central-1.amazonaws.com/id/FFFFFFFFFFFFFFFF0123456789ABCDEF"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.eu-central-1.amazonaws.com/id/FFFFFFFFFFFFFFFF0123456789ABCDEF:sub": 
            "system:serviceaccount:cert-manager:cert-manager"
        }
      }
    }
  ]
}

This role gets a policy with the permission to modify the DNS zone as you already wrote in your documentation, ideally restricted to the the required zones. Let's assume the ARN of this role is arn:aws:iam::0123456789012:role/cert-manager-demo.

To tell cert-manager to actually make use of that granted permission and assume that role, you annotate the service account like this:

apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::0123456789012:role/cert-manager-demo
  name: cert-manager
  namespace: cert-manager

Now a mutating admission controller will modify all pods running with that service account as follows:

apiVersion: apps/v1
kind: Pod
# ...
spec:
  # ...
  containers:
  - name: ...
    # ...
    env:
    - name: AWS_ROLE_ARN
      value: arn:aws:iam::0123456789012:role/cert-manager-demo
    - name: AWS_WEB_IDENTITY_TOKEN_FILE
      value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    volumeMounts:
    - mountPath: /var/run/secrets/eks.amazonaws.com/serviceaccount
        name: aws-iam-token
        readOnly: true
  volumes:
  - name: aws-iam-token
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          audience: sts.amazonaws.com
          expirationSeconds: 86400
          path: token

According to the linked blog post, you need to update the AWS SDK for Go to version 1.23.13 or beyond, which is able to process the injected information. I am not sure wether you need to change anything in the code, as External-DNS actually works and they just call AssumeRole as you do.

The cluster issuer doesn't need the role attribute as this went to the service account:

apiVersion: certmanager.k8s.io/v1alpha1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
  namespace: cert-manager
spec:
  acme:
    ...
    solvers:
    - selector:
        dnsZones:
        - "example.com"
      dns01:
        route53:
          region: eu-central-1
          hostedZoneID: DNSZONEIDHERE
          # no more role here

I think in the end it's just updating the AWS SDK dependency, annotating the service account and adjusting the documentation.

Update: I think https://github.com/jetstack/cert-manager/pull/2083 and https://github.com/jetstack/cert-manager/pull/2086 do exactly this, but there hasn't been a release since then :)

munnerz commented 5 years ago

We've just today cut the v0.11.0-beta.0 release, which includes this (as well as quite a few other changes!)

If you've got the chance to give this a go and update this issue with the results, it'd be great to find out if there's anything more we need to do to support this 😄

hendrikhalkow commented 5 years ago

Yes, just tried it, but I am unable to create a cluster issuer due to https://github.com/jetstack/cert-manager/issues/2109. When creating the custom resource definitions, I got this validation error

error: error validating "https://raw.githubusercontent.com/jetstack/cert-manager/release-0.11/deploy/manifests/00-crds.yaml": error validat
ing data: ValidationError(CustomResourceDefinition.spec.validation.openAPIV3Schema.properties.spec.properties.solver.properties.dns01.prope
rties.webhook.properties.config): unknown field "x-kubernetes-preserve-unknown-fields" in io.k8s.apiextensions-apiserver.pkg.apis.apiextens
ions.v1beta1.JSONSchemaProps; if you choose to ignore these errors, turn validation off with --validate=false

Workaround is using --validate=false which allows creating the custom resource definitions. Tested on Amazon EKS 1.14.7.

munnerz commented 5 years ago

I've added some comments in #2109 😄

munnerz commented 5 years ago

& yep, that fields is only present in k8s 1.15 onwards. Setting --validate=false is absolutely fine to do. The error occurs because kubectl has validation based on OpenAPI schemas, and in 1.14 that field was not present. The Kubernetes apiserver will just silently drop this field when it gets submitted to an older apiserver 😄

hendrikhalkow commented 5 years ago

Hello there, I can confirm that opening port 443 as described in https://github.com/jetstack/cert-manager/issues/2109 made creating the cluster issuer work.

I also added the eks.amazonaws.com/role-arn annotation that made the admission controller mutate the cert manager pod. Here is an excerpt from kubectl describe pod:

   Environment:
      POD_NAMESPACE:                cert-manager (v1:metadata.namespace)
      AWS_ROLE_ARN:                 arn:aws:iam::xxxxxxxxxxxx:role/CertManager-demo
      AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    Mounts:
      /var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from cert-manager-token-c595c (ro)

However, the challenge fails because the injected token cannot be read:

$ kubectl describe challenge ...
...
Error presenting challenge: Failed to change Route 53 record set: WebIdentityErr: unable to read file at /var/run/secrets/eks.amazonaws.com/serviceaccount/token
...
$ kubectl logs cert-manager-...
...
E1010 12:04:47.578424       1 controller.go:131] cert-manager/controller/challenges "msg"="re-queuing item  due to error processing" "error"="Failed to change Route 53 record set: WebIdentityErr: unable to read file at /var/run/secrets/eks.amazonaws.com/serviceaccount/token\ncaused by: open /var/run/secrets/eks.amazonaws.com/serviceaccount/token: permission denied" "key"="cert-manager/demo-example-com-xxxxxxxxxx-xxxxxxxxxx-xxxxxxxxxx"
...

As the container has no shell I am unable to check the current permissions, but I think it's just a small thing.

Edit: enabling the securityContext solved this. Everything is working as expected as soon as the required changes are made. The only thing left is adjusting the documentation and the Kubernetes manifests accordingly.

Changes to be made:

  securityContext:
    fsGroup: 1001
cookandy commented 5 years ago

Hello,

I'm having some issues getting the IAM roles working with v11.0.0, and I fear I'm missing something simple.

Here is how I'm installing cert manager

  1. Apply v0.11 CRDs

    kubectl apply -f https://raw.githubusercontent.com/jetstack/cert-manager/release-0.11/deploy/manifests/00-crds.yaml --validate=false
  2. Create the cert-manager namespace and labels (I added both to be backwards compatible)

    kubectl create namespace cert-manager
    kubectl label namespace cert-manager cert-manager.io/disable-validation=true
    kubectl label namespace cert-manager certmanager.k8s.io/disable-validation=true
  3. I then install cert manager v0.11.0 using Helm, and a custom values.yaml

    helm install --name cert-manager \
    --namespace cert-manager \
    -f values.yaml \
    --version 0.11.0 \
    jetstack/cert-manager

The values.yaml contains the annotation and securityContext @hendrikhalkow mentioned, as well as setting letsencrypt-dev as the defaultIssuer.

serviceAccount:
  # Specifies whether a service account should be created
  create: true
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::xxx:role/my-role

ingressShim:
  defaultIssuerName: "letsencrypt-dev"
  defaultIssuerKind: "ClusterIssuer"

securityContext:
  enabled: true
  fsGroup: 1001
  1. I then add my ClusterIssuer, which looks like this (notice the updated apiVersion)
apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
  name: letsencrypt-dev
  namespace: cert-manager
spec:
  acme:
    # The ACME server URL
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    # Email address used for ACME registration
    email: me@mydomain.com
    # Name of a secret used to store the ACME account private key
    privateKeySecretRef:
      name: letsencrypt-dev
    solvers:
    - selector:
        dnsZones:
          - "mydomain"
      dns01:
        route53:
          region: us-east-1

Finally, I deploy my Ingress with the kubernetes.io/tls-acme: "true" annotation.

However, in the logs I'm seeing an error AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity:

"msg"="re-queuing item  due to error processing" "error"="Failed to determine Route 53 hosted zone ID: WebIdentityErr: failed to retrieve credentials\ncaused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity\n\tstatus code: 403, request id: c5209031-eb99-11e9-b4ee-d30fa2245e4a" "key"="monitoring/grafana-dev-us-east-1-lets-encrypt-3071181437-3860980709-2148988824" 

Any ideas? Any help would be greatly appreciated.

hendrikhalkow commented 4 years ago

Hi @cookandy, the fact that you get this error message shows that you already set up your cert-manager correctly because AssumeRoleWithWebIdentity is already being attempted. Please check your trust policy (see code snippet above) where you allow the service account to perform that action.

baelish commented 4 years ago

Yes, just tried it, but I am unable to create a cluster issuer due to #2109. When creating the custom resource definitions, I got this validation error

error: error validating "https://raw.githubusercontent.com/jetstack/cert-manager/release-0.11/deploy/manifests/00-crds.yaml": error validat
ing data: ValidationError(CustomResourceDefinition.spec.validation.openAPIV3Schema.properties.spec.properties.solver.properties.dns01.prope
rties.webhook.properties.config): unknown field "x-kubernetes-preserve-unknown-fields" in io.k8s.apiextensions-apiserver.pkg.apis.apiextens
ions.v1beta1.JSONSchemaProps; if you choose to ignore these errors, turn validation off with --validate=false

Workaround is using --validate=false which allows creating the custom resource definitions. Tested on Amazon EKS 1.14.7.

Getting the same error on GKE (GitVersion:"v1.14.6-gke.1" server, GitVersion:"v1.15.2" client) and the same workaround solves the immediate issue. It would be nice however if the validation works. Should I raise a separate issue for it or is it already on the todo list?

whereisaaron commented 4 years ago

I had the same problems with 0.11, first with the CRD validation failing and then withe cert-manager unable to read the Service Account token:

unable to read file at /var/run/secrets/eks.amazonaws.com/serviceaccount/token

The same --validate=false and securityContext.enabled: true changes fixed these for me, and DNS challenges are working for the IAM Service Account Role and well as assuming cross-account roles.

BTW, I noticed the cross-account instructions on the website fails to mention that the policy in account X needs to include at ability to assume the role in account A, as well as for account A to trust the role in account X.

ssdowd commented 4 years ago

I am having the same issue described here. after following some of the steps above on 1 cluster, i was able to get the route53 dns01 challenge to work. however i was unable to achieve that on another cluster. my conclusion is that this needs some explicit documentation explaining how to update the trust relationships, policies, etc so there is a decent guide beyond an open github issue.

Krishna1408 commented 4 years ago

I am getting error instantiating route53 challenge solver: unable to assume role: NoCredentialProviders: no valid providers in chain issue while working with cert manager.

I am using latest version 0.12 and I have created an IAM role for my lets-encrypt ClusterIssuer. Below is my lets-encrypt cluster issuer:

apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
  name: xxxx-issuer
spec:
  acme:
    email: krishnakumar.sharma@sennder.com
    privateKeySecretRef:
      name: xxxx-issuer
    server: https://acme-v02.api.letsencrypt.org/directory
    solvers:
    - dns01:
        route53:
          role: arn:aws:iam::xxxx:role/kube2iam-cert_manager_role
          hostedZoneID: xxxx
          region: eu-central-1
      selector:
        dnsZones:
        - sennder.com

Below is my IAM role: https://cert-manager.io/docs/configuration/acme/dns01/route53/#set-up-a-iam-role

shaunc commented 4 years ago

@hendrikhalkow ... I'm trying to follow your instructions above to enable in EKS. Just to confirm -- "arn:aws:iam::0123456789012:oidc-provider/oidc.eks.eu-central-1.amazonaws.com/id/FFFFFFFFFFFFFFFF0123456789ABCDEF" is the oidc provider associated with the EKS cluster? ... it looks suspiciously simple :) -- I worry I am not understanding something.

So ... the service account request tokens for the role from the oidc endpoint, and injects them into pods in the cert-manager namespace, which allows the cert manager (and anything else in the namespace?) to use them. (Or is there some further config somewhere limiting what can use the serviceaccount?)

What is the security context? What is the "magic config" fsGroup: 1001? It would be useful for someone to provide links to documentation (and/or write some) for beginners like myself that aren't familiar with all the internals. (I had read https://cert-manager.io/docs/configuration/acme/dns01/route53/ ... then found this issue when I googled for what to put into the assume role policy.)

Update: I'm trying to install via kustomize. To what should I be applying the "securityContext"?

hendrikhalkow commented 4 years ago

@shaunc it's really that simple. Today, my working Helm values look like this:

securityContext:
  enabled: "true"
serviceAccount:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::000000000000:role/CertManagerRoleName

Replace the 0 digits with your actual AWS account ID. The IAM role CertManagerRoleName has a policy attached that looks like this:

{
    "Statement": [
        {
            "Action": "route53:GetChange",
            "Effect": "Allow",
            "Resource": "arn:aws:route53:::change/*"
        },
        {
            "Action": "route53:ListHostedZonesByName",
            "Effect": "Allow",
            "Resource": "*"
        },
        {
            "Action": "route53:ChangeResourceRecordSets",
            "Effect": "Allow",
            "Resource": "arn:aws:route53:::hostedzone/YOUR_HOSTED_ZONE_ID"
        }
    ],
    "Version": "2012-10-17"
}

Replace _YOUR_HOSTED_ZONEID with your actual ID of your Route53 zone. To check if everything works, check your cert manager pod:

k describe pod -n cert-manager cert-manager-...

You should see a volume aws-iam-token which indicates that it's working.

sc250024 commented 4 years ago

Note: you will still need to use spec.solvers[X].dns01.route53.role if your Route53 zones live inside another account. The reason is because the the IAM-linked service roles provide temporary credentials to your pod, but it will not assume the role for you.

The examples listed above from other users assume the role lives inside the same AWS account as the Route53 zones, which means the permissions to modify DNS records are attached to the role directly.

From AWS documentation, they give an example of an AWS config:

[profile account_b_role]
source_profile = account_a_role
role_arn=arn:aws:iam::222222222222:role/account-b-role

[profile account_a_role]
web_identity_token_file = /var/run/secrets/eks.amazonaws.com/serviceaccount/token 
role_arn=arn:aws:iam::111111111111:role/account-a-role
munnerz commented 4 years ago

@sc250024 thanks for the details 😄 would you be able to add a note to our docs to explain this for the next person? It'd probably save them a lot of time digging out and stumbling across this issue in future! 😄

sc250024 commented 4 years ago

@munnerz Sure.

thismatters commented 4 years ago

I'm seeing

unable to read file at /var/run/secrets/eks.amazonaws.com/serviceaccount/token\ncaused by: open /var/run/secrets/eks.amazonaws.com/serviceaccount/token: permission denied"

I haven't been able to apply the securityContext.enabled: true fix since I don't use helm and like @shaunc I cannot find much information about it.

Edit: I was able to grab the manifest for the cert-manager deployment and add

    spec:
      securityContext:
        enabled: true
        fsGroup: 1001
      containers: ...

Seems to work after applying that.

derrickburns commented 4 years ago
    spec:
      securityContext:
        enabled: true
        fsGroup: 1001
      containers: ...

Please add this to the cert-manager deployment. This is required in AWS so that the cert-manager pod can access the token that has the credentials.

munnerz commented 4 years ago

Would somebody be able to create a PR to add this block into our Helm chart by default? I think it's harmless for users running outside of AWS, and should hopefully resolve issues like this across different cloud providers! 😄

sc250024 commented 4 years ago

@munnerz That's how people get coronavirus. Today, you're forcing Kubernetes annotations on them, even if they don't use it. It's a slippery slope.

munnerz commented 4 years ago

@sc250024 I don't think that's very appropriate - please be mindful of your words and how they may affect others. We want to ensure nobody feels excluded here, and comments like these could really upset anyone affected by the virus.


On the topic at hand, it looks like this PR: https://github.com/jetstack/cert-manager/commit/3b838758a34c1c56d20eaaa69246d68484585a2d changed how the securityContext is set.

@thismatters @derrickburns would you be able to elaborate on your solutions above with some full examples, including what version of CM you applied the patch to? As well, would you mind testing it out with the latest (v0.14.0) version of cert-manager too?

thismatters commented 4 years ago

I'm not really much of a kubernetes expert, nor does startup life (amidst a pandemic) permit much time for rework. Now that I've gotten everything working just right I'm pretty reticent to touch anything.

I was patching version 0.13.1. I downloaded the manifest (https://github.com/jetstack/cert-manager/releases/download/v0.13.1/cert-manager.yaml) found deployment/cert-manager, added the aforementioned securityContext block, and applyd it to my cluster.

derrickburns commented 4 years ago

@munnerz The PR that you mentioned works for me. There is no more need for changes to work with IAM for Service Accounts. Thanks!

TBBle commented 4 years ago

It's not clear from this bug report, but it seems that I still need fsGroup: 1001 to use IRSA with cert-manager? Tested with 0.14.2, where the last-mentioned PR (#2455) is present.

I read @derrickburns 's last comment as meaning that the fsGroup setting is no longer needed...

Does this block need to be mentioned in the docs?

guisilva-cid commented 3 years ago

It finally worked on my cluster when I carefully read all the instructions on this thread!

kudos to @hendrikhalkow and @munnerz !

zerthimon commented 3 years ago

It would be nice to have this documented in the helm chart values.yaml or made default.

TBBle commented 3 years ago

Mentioned on the k8s 1.19 for EKS release notes, I believe the fsGroup setting should not be needed for k8s 1.19 onwards, due to the implementation of https://github.com/kubernetes/enhancements/pull/1598.

nikhileshva commented 3 years ago

Hi, @hendrikhalkow I've followed exactly how you mentioned here for the setup, but I get the same error as @cookandy gets.

Here is my trust policy, which I don't think is incorrect (or if it is, please let me know)

{
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::012345678901:oidc-provider/oidc.eks.eu-central-1.amazonaws.com/id/5............237"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.eu-central-1.amazonaws.com/id/5............237:sub": "system:serviceaccount:cert-manager:cert-manager"
        }
      }
    }

I also have this in the pod description:

Environment:                                                                                                                                                                                                                           
       POD_NAMESPACE:                cert-manager (v1:metadata.namespace)                                                                                                                                                                   
       AWS_DEFAULT_REGION:           eu-central-1                                                                                                                                                                                           
       AWS_REGION:                   eu-central-1                                                                                                                                                                                           
       AWS_ROLE_ARN:                 arn:aws:iam::012345678901:role/cert-manager                                                                                                                                                            
       AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/eks.amazonaws.com/serviceaccount/token                                                                                                                                                
     Mounts:                                                                                                                                                                                                                                
       /var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)                                                                                                                                                            
       /var/run/secrets/kubernetes.io/serviceaccount from cert-manager-token-2jxpg (ro)

and this is the values.yaml

image:
  tag: v0.16.1
serviceAccount:
  annotations: 
    eks.amazonaws.com/role-arn: arn:aws:iam::{{ .Values | get "aws.account" }}:role/cert-manager
securityContext:
  enabled: true

Failed to determine Route 53 hosted zone ID: WebIdentityErr: failed to retrieve credentials\ncaused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity\n\tstatus code: 403

I don't really understand what am I missing.

sc250024 commented 3 years ago

@nikhileshva Did you check out the docs here? https://cert-manager.io/docs/configuration/acme/dns01/route53

nikhileshva commented 3 years ago

@sc250024 I followed the doc and also the thread here. My initial setup was working with kube2iam, so I have the rout53 role policy in place which is working. Now, switching to IRSA setup, first error I see was this

Failed to determine Route 53 hosted zone ID: WebIdentityErr: failed fetching WebIdentity token: \ncaused by: WebIdentityErr: unable to read file at /var/run/secrets/eks.amazonaws.com/serviceaccount/token\ncaused by: open /var/run/secrets/eks.amazonaws.com/serviceaccount/token: permission denied"

I figured that my service account was not setup correctly as it did not have the correct namespace there. After fixing this, I got the error mentioned in my previous post. Adding securityContext, as mentioned here in thread, didn't work either.

EDIT: I'm not using cross account

some-random-dude123 commented 3 years ago

@nikhileshva did you fix your issue ? I have the same. IRSA is in place, annotation is on the service account, security context is set.. but I still have the error "is not authorized to perform: sts:AssumeRole on resource ..."

Edit: I figured out my issue. When you're not doing a cross-account, the role property is not needed in the ClusterIssuer. That role ARN must only be on the service account.

ishworg commented 2 years ago

thanks @some-random-dude123

I am providing full ClusterIssuer on Staging for anyone who lands here from $searchEngine.

---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: le-stg-issuer
spec:
  acme:
    email: <my@email.com>
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      # Secret resource that will be used to store the account's private key.
      name: le-stg-issuer-account-key
    solvers:
    - selector:
        dnsZones:
        - "<hostedZoneDNS>"
      dns01:
        route53:
          region: <awsRegion>
          hostedZoneID: <hostedZoneID>

Sample kuard app using Nginx Inc. IC:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kuard
spec:
  selector:
    matchLabels:
      app: kuard
  replicas: 2
  template:
    metadata:
      labels:
        app: kuard
    spec:
      containers:
      - image: gcr.io/kuar-demo/kuard-amd64:1
        imagePullPolicy: Always
        name: kuard
        ports:
        - containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: kuard
spec:
  ports:
  - port: 80
    targetPort: 8080
    protocol: TCP
  selector:
    app: kuard

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: kuard
  annotations:
    #  NB: it's a `cert-manager.io/cluster-issuer` and not `cert-manager.io/issuer`.
    cert-manager.io/cluster-issuer: "le-stg-issuer"
    nginx.org/proxy-connect-timeout: "60s"
    nginx.org/proxy-read-timeout: "30s"
    nginx.org/client-max-body-size: "6m"
    nginx.org/server-snippets: |
      # https://cipherlist.eu/
      ssl_protocols TLSv1.2 TLSv1.3;
spec:
  ingressClassName: "nginx"
  tls:
  - hosts:
    - my.host.name
    secretName: kuard-stg-tls
  rules:
  - host: my.host.name
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: kuard
            port:
              number: 80
f-ld commented 2 years ago

I know this is an old ticket already closed, but since I came here for some issue discussed here and still had it after going through the previous comments, then I am sharing my troubles how I solved it.

On cert manager 1.5.5, whatever the ClusterIssuer configuration (with or without role) I have the same error:

E0428 11:20:28.782720       1 controller.go:163] cert-manager/controller/challenges "msg"="re-queuing item due to error processing" "error"="failed to change Route 53 record set: WebIdentityErr: failed to retrieve credentials\ncaused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity\n\tstatus code: 403, request id: b37eb35f-139b-4e6d-869a-e82cd1dd7fb9" "key"="<namespace>/<certificate-name>-gpwrv-1785813032-1499251413" 

The reason was that I deployed cert manager using helm giving "foobar" as name for the release. So the cert manager pod was using a service account called oobar-cert-manager ... while my AWS web identity configuration was expecting cert-manager. Enforcing the cert manager service account to be cert-manager fixed it. This go achieved using this in values.yaml for the chart deployment:

serviceAccount:
  create: true
  name: "cert-manager"  # This is the line that was missing
  annotations:
    k8s.dolby.io/role-arn: arn:aws:iam::<acount>:role/<rolename>
brockoffdev commented 1 year ago

Note: you will still need to use spec.solvers[X].dns01.route53.role if your Route53 zones live inside another account. The reason is because the the IAM-linked service roles provide temporary credentials to your pod, but it will not assume the role for you.

The examples listed above from other users assume the role lives inside the same AWS account as the Route53 zones, which means the permissions to modify DNS records are attached to the role directly.

From AWS documentation, they give an example of an AWS config:

[profile account_b_role]
source_profile = account_a_role
role_arn=arn:aws:iam::222222222222:role/account-b-role

[profile account_a_role]
web_identity_token_file = /var/run/secrets/eks.amazonaws.com/serviceaccount/token 
role_arn=arn:aws:iam::111111111111:role/account-a-role

Hi @sc250024 circling back to this...

We're utilizing IRSA for our Kubernetes cluster, and are attempting to assume another role from the service account role that has been assigned to cert manager (as a means of cross-account role assumption). Unfortunately, there isn't a way to inform Cert Manager, from the looks of things, to utilize the Web Token Identity File to do this effectively (as you've laid out here). We can't even set the AWS_PROFILE environment variable with a mounted config, as this would likely make the original role not work properly.

Any idea of how Cert Manager could handle this?

gecube commented 11 months ago

I have the same issues with cross-account IRSA. When I used explicit access and secret keys in K8s secret - everything was fine, but I wanted to remove all tokens and make eveything automagic, so I switched to IRSA. Now I can't debug what's going on as I am getting again and again the next message:

I1027 11:15:50.526798       1 route53.go:76] "cert-manager/route53-session-provider: using ambient credentials"
I1027 11:15:50.526866       1 route53.go:93] "cert-manager/route53-session-provider: assuming role" role="arn:aws:iam::308712144460:role/dns-manager"
E1027 11:15:50.531612       1 sync.go:282] "cert-manager/challenges/finalizer: error cleaning up challenge" err=<
    error instantiating route53 challenge solver: unable to assume role: WebIdentityErr: failed to retrieve credentials
    caused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity
        status code: 403, request id: d82d60c1-9dcc-4e83-82d7-3f76a489bd67

no further debug, it is the maximum log level. I really wonder what requisites cert-manager is using to try AssumeRoleWithWebIdentity. I checked the permissions several times and I don't see any issues with it. However, cross-account IRSA is complicated, not so easy as I expected. But what's interesting - I was able to set up it for other services, but not for cert-manager.

Kindly looking for any assistance.

@brockoffdev @nikhileshva Did you manage to get it working?

danielrive commented 10 months ago

I am having the same issue, I validated the events in AWS Cloudtrail and saw that AWS is providing the Accesskey and token, but I don't see access-denied in the assume-role with web-identity. To test the process i modified the permissions in the trust-policy in the role, and then I saw some access-denied in cloudtrail, I am not sure what is happening :(

meysam81 commented 5 months ago

I have read all the comments and applied every combination of changes proposed, none worked.

But, for anyone else getting to my comment, here's what I did that made it work...

The eks.amazonaws.com/role-arn annotation on ServiceAccount should work. At least that's what I expected to see having worked with External Secrets Operator before. BUT IT DOES NOT! :x:

On the other hand, it seems that the normal AWS CLI workflow, having the following env vars on the cert-manager Deployment makes it work:

The cert-manager Deployment will take the [missing] AWS_REGION from the ClusterIssuer as it turns out.

Here's the piece of code that makes it work anyway if you're curious:

https://github.com/cert-manager/cert-manager/blob/aa17b34edea7d0cce6efaa099053b03606d3f84e/pkg/issuer/acme/dns/route53/route53.go#L96-L99

Additionally, you'd need the Kubernetes projected volumes manually mounted on the Deployment too!

Here's the full list of changes on the Deployment/cert-manager for your reference:

# ... truncated ...
containers:
  env:
    - name: AWS_ROLE_ARN
      value: arn:aws:iam::XXXXXXXXXXXX:role/cert-manager
    - name: AWS_WEB_IDENTITY_TOKEN_FILE
      value: /var/run/secrets/eks.amazonaws.com/token
  volumeMounts:
    - mountPath: /var/run/secrets/eks.amazonaws.com
      name: token
      readOnly: true
securityContext:
  fsGroup: 1001
volumes:
  - name: token
    projected:
      defaultMode: 420
      sources:
        - serviceAccountToken:
            audience: sts.amazonaws.com
            expirationSeconds: 3600
            path: token
# ... truncated ...

The IAM Role trust policy isn't anything unusual.

Click to expand ```json "OIDC_URL:sub": "system:serviceaccount:cert-manager:cert-manager", "OIDC_URL:aud": "sts.amazonaws.com" ```