cert-manager / aws-privateca-issuer

Addon for cert-manager that issues certificates using AWS ACM PCA.
Apache License 2.0
184 stars 77 forks source link

feat: recover from previously failed attempt #291

Open woehrl01 opened 11 months ago

woehrl01 commented 11 months ago

If a previous certificate request failed, it will be currently never retried again.

This adds a configuration max-retry-duration to allow a transition time to recover from transient errors.

The PR includes:

jetstack-bot commented 11 months ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign sgtcodfish for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/cert-manager/aws-privateca-issuer/blob/main/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
woehrl01 commented 11 months ago

Looking at the implementation at: https://github.com/cert-manager/issuer-lib/blob/main/controllers/request_controller.go#L306 I can see that it should be handled differently. I'm replacing the PR with an approach like that: It records the errors as intermediate errors including a maxretryduration.

woehrl01 commented 11 months ago

PR is now updated to reflect a similar behaviour as in the https://github.com/cert-manager/issuer-lib, by allowing a grace period of retries.

github-actions[bot] commented 11 months ago

Detected different CRDs in the config/crd/bases directory and charts/aws-pca-issuer/crds directory. Since both CRDs were modified in this commit(s), they were unable to be automatically synced. Please update the pull request with identical CRDs for this workflow to pass.

dcamzn commented 11 months ago

@woehrl01 cert-manager should retry failed issuance per their documentation (https://cert-manager.io/docs/faq/). Do you know if cert-manager is retrying? If it's not, any idea why it's not working as intended?

woehrl01 commented 11 months ago

@dcamzn No I don't know. According to the docs it's only retrying if it's not in a terminal state (marked as failed). I also looked through the source code and it looks like that it only retries a terminal state if you delete the certificaterequest (that's also a recommendation according to the faq). In my case we only have a window of 32h until the certificate is invalid and it had never retried/recreated a certificaterequest once during that time, after it failed.

Even an active "renew" command via cli isn't creating a new certificaterequest 48h after terminal failure.

It could be because we never set the condition Issuing to false. But Ready to false

dcamzn commented 11 months ago

@woehrl01 thank you for that info. we're going to keep looking into this. We'll take a look at the Issuing and Ready conditions that you called out. Thank you!

woehrl01 commented 11 months ago

@dcamzn thanks. One tip to reproduce the issue quite easily:

Expected scenario is that it would retry issuing the certificate afterwards automatically/eventually.

My PR will basically just allow this for a configurable grace period. We already use that change successful in production.

dcamzn commented 11 months ago

thanks @woehrl01 on steps to reproduce the issue. you read our minds!

dcamzn commented 11 months ago

If it fails at the point of the certificaterequest, we think the right behavior is for the certificaterequest to be deleted by the cert-manager controller and recreated by the cert-manager controller, so that all issuers get fixed.

we'll circle back after reproducing the issue.

woehrl01 commented 11 months ago

@dcamzn great, thank you! Let me share the following docs about the certificaterequest which states that it's not retrying the issuing after failure: https://cert-manager.io/docs/concepts/certificaterequest/

So I guess the Faq is maybe not correct.

woehrl01 commented 11 months ago

Did some additional research, looks like that any of those conditions do not match, and the CertificateRequest is not getting deleted:

https://github.com/cert-manager/cert-manager/blob/c6ff0136d2973a4aa14c6c08f86c0e254b1f79c6/pkg/controller/certificates/requestmanager/requestmanager_controller.go#L222-L261

divyansh-gupta commented 11 months ago

I was able to reproduce and cut this SIM to cert-manager: https://github.com/cert-manager/cert-manager/issues/6405.

I saw that one of the times I reproduced, if I waited long enough, it did retry exactly once. I was not able to get that behavior consistently.

divyansh-gupta commented 11 months ago

I was able to get retries to work, see https://github.com/cert-manager/cert-manager/issues/6405 for details. It seems the retries just take longer between each try than they did before, which was a change in cert-manager 1.8.

woehrl01 commented 11 months ago

Thank you @divyansh-gupta for investigating, I also have read through the PR, but this is not what I see.

I would like to share the exact logs which shows that cert-manager is retrying, but it's not creating a new CertificateRequest resource.

Bildschirmfoto 2023-10-15 um 13 52 53

The "failed to retrieve provisioner" is the error returned by aws-privateca-issuer

woehrl01 commented 10 months ago

@dcamzn @divyansh-gupta what are your thought on this PR, after syncing with the cert-manager team?

cunningr commented 10 months ago

We are also hitting a very similar issue. If you create the cert and the AWSPCAIssuer at the same time, the Certificate request happens before the AWSPCAIssuer is ready. Although cert-manager seems to retry after 1h, it is never successful in creating the certificate.

divyansh-gupta commented 10 months ago

Hi @woehrl01 - let me share with your my results when replicating the exact situation you described:

  1. Create the certificate with NO issuer in place and describe it:
[ec2-user@ip-172-31-45-200 ~]$ kubectl describe Certificate
Name:         pca-issuer-rsa-cert-retry-blah-1
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  cert-manager.io/v1
Kind:         Certificate
Metadata:
  Creation Timestamp:  2023-10-26T20:46:37Z
  Generation:          1
  Resource Version:    1364963
  UID:                 65949e92-9dd4-47c5-885d-d654d4ffb963
Spec:
  Dns Names:
    issuer-rsa-example.com
  Issuer Ref:
    Group:  awspca.cert-manager.io
    Kind:   AWSPCAIssuer
    Name:   pca-issuer-rsa-2
  Private Key:
    Algorithm:  RSA
    Size:       2048
  Secret Name:  pca-issuer-rsa-cert-tls-retry-blah-1
  Subject:
    Organizations:
      aws
Status:
  Conditions:
    Last Transition Time:    2023-10-26T20:46:37Z
    Message:                 Issuing certificate as Secret does not exist
    Observed Generation:     1
    Reason:                  DoesNotExist
    Status:                  False
    Type:                    Ready
    Last Transition Time:    2023-10-26T20:46:37Z
    Message:                 The certificate request has failed to complete and will be retried: issuer could not be found
    Observed Generation:     1
    Reason:                  Failed
    Status:                  False
    Type:                    Issuing
  Failed Issuance Attempts:  1
  Last Failure Time:         2023-10-26T20:46:37Z
Events:
  Type     Reason     Age   From                                       Message
  ----     ------     ----  ----                                       -------
  Normal   Issuing    27s   cert-manager-certificates-trigger          Issuing certificate as Secret does not exist
  Normal   Generated  27s   cert-manager-certificates-key-manager      Stored new private key in temporary Secret resource "pca-issuer-rsa-cert-retry-blah-1-nfkg7"
  Normal   Requested  27s   cert-manager-certificates-request-manager  Created new CertificateRequest resource "pca-issuer-rsa-cert-retry-blah-1-1"
  Warning  Failed     27s   cert-manager-certificates-issuing          The certificate request has failed to complete and will be retried: issuer could not be found
  1. Check the CertificateRequest immediately and see that it failed because "issuer could not be found":
[ec2-user@ip-172-31-45-200 ~]$ kubectl describe CertificateRequest
Name:         pca-issuer-rsa-cert-retry-blah-1-1
Namespace:    default
Labels:       <none>
Annotations:  cert-manager.io/certificate-name: pca-issuer-rsa-cert-retry-blah-1
              cert-manager.io/certificate-revision: 1
              cert-manager.io/private-key-secret-name: pca-issuer-rsa-cert-retry-blah-1-nfkg7
API Version:  cert-manager.io/v1
Kind:         CertificateRequest
Metadata:
  Creation Timestamp:  2023-10-26T20:46:37Z
  Generation:          1
  Owner References:
    API Version:           cert-manager.io/v1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  Certificate
    Name:                  pca-issuer-rsa-cert-retry-blah-1
    UID:                   65949e92-9dd4-47c5-885d-d654d4ffb963
  Resource Version:        1364959
  UID:                     58a242be-e3ae-488e-81bc-1c20289c4838
Spec:
  Extra:
    authentication.kubernetes.io/pod-name:
      cert-manager-75d57c8d4b-j5f6s
    authentication.kubernetes.io/pod-uid:
      fbc686ee-060a-4215-acbc-39fab0993079
  Groups:
    system:serviceaccounts
    system:serviceaccounts:cert-manager
    system:authenticated
  Issuer Ref:
    Group:   awspca.cert-manager.io
    Kind:    AWSPCAIssuer
    Name:    pca-issuer-rsa-2
  Request:   LS0tLS1CRUdJTiBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0KTUlJQ2xEQ0NBWHdDQVFBd0RqRU1NQW9HQTFVRUNoTURZWGR6TUlJQklqQU5CZ2txaGtpRzl3MEJBUUVGQUFPQwpBUThBTUlJQkNnS0NBUUVBMit5MGcrQzRMdzhwK1VqUlpPSW8zZ0lUQ1BHQ1RHRVBKeUpsL1FQNDdrdHJuZld4ClFyTTMrSUFRdkY4dkNSSTI2dlVBWXo0bnJNeUppbXpsK2FHQ2JPeE8wbmpaRXhjM0RnYUFRYkNHN1ZYQ3NXUGEKUnp6NUpDUCtPSmdFT3MzT1drT0lWdXJ5UE95TEpnVmdoUHR2SURTcGJNY0ozQzNKVmtoSkJQNTVVdkdPeFBrWgpDRUhpaXlsbVdXNEVzMnYrUjRlOGtxUW5kaWJLa1NtNk4wR2g0Ull1N1ppcGpPK1A0LzNYTkNZWFN4Z3JISVRPCjg5TVNYdFpCbENhV1FmR3doZnFCZnJnc1FCdDhwa2w1OVlSTTRadW41aEJUd0IzRFYwUGVGdm80VEZBcWFTbzgKZXU0MEJpMXJMUmszejFVMnBtaWVScWtweVA1aGZzVGVkNEZ2ZlFJREFRQUJvRUV3UHdZSktvWklodmNOQVFrTwpNVEl3TURBaEJnTlZIUkVFR2pBWWdoWnBjM04xWlhJdGNuTmhMV1Y0WVcxd2JHVXVZMjl0TUFzR0ExVWREd1FFCkF3SUZvREFOQmdrcWhraUc5dzBCQVFzRkFBT0NBUUVBbVJwd2Fjbm9naFRVZ2Izc0xJY0FtMmttQXNTRWU0N1AKeE44cmtHRnY4YUdzSmpJaFYyb09QdjkrWkIzNmlEcy9KVmRCaGFwZlpvNTBBNW9DK2hNRzUrS2FlUUlrTW1pVApTUjhGVFVSak1IUXFQSCttenlqNVoxNXErYUhERk1NcXRxeml4WlFDSnJ4V09rbFVIK3RPelR5TytFTmg4bW0vCit0SHFqdnY3clA4eWlKY1JlUVlSMVErQndGTCs0b0NJcjFnYlB5bmNuVjBsUkhCK0NpOFVHWHN4bm1IOG9xa1YKdTkyaU5rUHZ0dTk0NGNqRTVva1hWd1NHc0dDZ1ZBeGNZd0NnbEpNaW9ESmd6NzRBRW5uMlVGcnZXelVpM2lMdApRQUxuOFVWeXYweUF0Tjhsc3V4aGlCQUlJVzUyNENHaCt6MGxsRzVGSWVpZ1E0a1lxc1BueGc9PQotLS0tLUVORCBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0K
  UID:       f5c14e33-830f-420e-86c1-4d6bce4197ef
  Username:  system:serviceaccount:cert-manager:cert-manager
Status:
  Conditions:
    Last Transition Time:  2023-10-26T20:46:37Z
    Message:               Certificate request has been approved by cert-manager.io
    Reason:                cert-manager.io
    Status:                True
    Type:                  Approved
    Last Transition Time:  2023-10-26T20:46:37Z
    Message:               issuer could not be found
    Reason:                Failed
    Status:                False
    Type:                  Ready
Events:
  Type     Reason           Age   From                                       Message
  ----     ------           ----  ----                                       -------
  Normal   cert-manager.io  87s   cert-manager-certificaterequests-approver  Certificate request has been approved by cert-manager.io
  Warning  Failed           87s   awspcaissuer-controller                    issuer could not be found
  1. Deploy the issuer and check its Ready, which it is:
[ec2-user@ip-172-31-45-200 ~]$ kubectl describe AWSPCAIssuer
Name:         pca-issuer-rsa-2
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  awspca.cert-manager.io/v1beta1
Kind:         AWSPCAIssuer
Metadata:
  Creation Timestamp:  2023-10-26T20:48:42Z
  Generation:          1
  Resource Version:    1365212
  UID:                 cb4cb08a-e0c3-4ae5-b7f3-9925ed49428b
Spec:
  Arn:     arn:aws:acm-pca:us-east-1:444007060692:certificate-authority/da6896c0-cd44-41e9-b436-c291c2018b16
  Region:  us-east-1
Status:
  Conditions:
    Last Transition Time:  2023-10-26T20:48:46Z
    Message:               Issuer verified
    Reason:                Verified
    Status:                True
    Type:                  Ready
Events:
  Type    Reason    Age              From                     Message
  ----    ------    ----             ----                     -------
  Normal  Verified  1s (x2 over 4s)  awspcaissuer-controller  Issuer verified
  1. Wait until the 1 hour mark so retry happens

  2. See the certificate failed the second retry -

[ec2-user@ip-172-31-45-200 ~]$ kubectl describe Certificate
Name:         pca-issuer-rsa-cert-retry-blah-1
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  cert-manager.io/v1
Kind:         Certificate
Metadata:
  Creation Timestamp:  2023-10-26T20:46:37Z
  Generation:          1
  Resource Version:    1371932
  UID:                 65949e92-9dd4-47c5-885d-d654d4ffb963
Spec:
  Dns Names:
    issuer-rsa-example.com
  Issuer Ref:
    Group:  awspca.cert-manager.io
    Kind:   AWSPCAIssuer
    Name:   pca-issuer-rsa-2
  Private Key:
    Algorithm:  RSA
    Size:       2048
  Secret Name:  pca-issuer-rsa-cert-tls-retry-blah-1
  Subject:
    Organizations:
      aws
Status:
  Conditions:
    Last Transition Time:    2023-10-26T20:46:37Z
    Message:                 Issuing certificate as Secret does not exist
    Observed Generation:     1
    Reason:                  DoesNotExist
    Status:                  False
    Type:                    Ready
    Last Transition Time:    2023-10-26T21:46:37Z
    Message:                 The certificate request has failed to complete and will be retried: issuer could not be found
    Observed Generation:     1
    Reason:                  Failed
    Status:                  False
    Type:                    Issuing
  Failed Issuance Attempts:  2
  Last Failure Time:         2023-10-26T21:46:37Z
Events:
  Type     Reason     Age                From                                       Message
  ----     ------     ----               ----                                       -------
  Normal   Issuing    22m (x2 over 82m)  cert-manager-certificates-trigger          Issuing certificate as Secret does not exist
  Normal   Requested  22m (x2 over 82m)  cert-manager-certificates-request-manager  Created new CertificateRequest resource "pca-issuer-rsa-cert-retry-blah-1-1"
  Warning  Failed     22m (x2 over 82m)  cert-manager-certificates-issuing          The certificate request has failed to complete and will be retried: issuer could not be found
  Normal   Generated  22m                cert-manager-certificates-key-manager      Stored new private key in temporary Secret resource "pca-issuer-rsa-cert-retry-blah-1-vfczn"

but see the CertificateRequest shows the cert issued anyway! And I checked, the cert exists and was issued -

[ec2-user@ip-172-31-45-200 ~]$ kubectl describe CertificateRequest
Name:         pca-issuer-rsa-cert-retry-blah-1-1
Namespace:    default
Labels:       <none>
Annotations:  cert-manager.io/certificate-name: pca-issuer-rsa-cert-retry-blah-1
              cert-manager.io/certificate-revision: 1
              cert-manager.io/private-key-secret-name: pca-issuer-rsa-cert-retry-blah-1-vfczn
API Version:  cert-manager.io/v1
Kind:         CertificateRequest
Metadata:
  Creation Timestamp:  2023-10-26T21:46:37Z
  Generation:          1
  Owner References:
    API Version:           cert-manager.io/v1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  Certificate
    Name:                  pca-issuer-rsa-cert-retry-blah-1
    UID:                   65949e92-9dd4-47c5-885d-d654d4ffb963
  Resource Version:        1371939
  UID:                     a416b3e9-589a-4f53-9b53-78498fdaf6cb
Spec:
  Extra:
    authentication.kubernetes.io/pod-name:
      cert-manager-75d57c8d4b-j5f6s
    authentication.kubernetes.io/pod-uid:
      fbc686ee-060a-4215-acbc-39fab0993079
  Groups:
    system:serviceaccounts
    system:serviceaccounts:cert-manager
    system:authenticated
  Issuer Ref:
    Group:   awspca.cert-manager.io
    Kind:    AWSPCAIssuer
    Name:    pca-issuer-rsa-2
  Request:   LS0tLS1CRUdJTiBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0KTUlJQ2xEQ0NBWHdDQVFBd0RqRU1NQW9HQTFVRUNoTURZWGR6TUlJQklqQU5CZ2txaGtpRzl3MEJBUUVGQUFPQwpBUThBTUlJQkNnS0NBUUVBMWVPb1VYbG9ZdVk0SXFKK252UXRiRUpDcFdkWk1GMHY3aTh1UDN3VG9DRlpvODRICmZoWTM4YXFvaHllY2NOZUZxdjRpeEV3Nm92SHY2ZUJYTTNtN2RCT3RsajdtblV4Rk5OSjdEdW05N0R0bVRURkUKSGZTTk9ES2d0L2ZsUktYS1FBUUNqTlhxSTRtZ2pEOHNoVlVEWHNhSjVEQVhqOUw2U3p5QWdzMXpldUJ2WGJuZApmMW1rRXFhMm4vZVA2SU5qRTJsY1V5dmdveGp5K0Myb2cramU3Y0FTZEM0SjYrdDFyUHA3T1FORUNaRG9jaDIyClBrZ3BXbHBNOXZXT0tvZjVVemFBMERuVEs3bnBFdDR5UHptcU5QWmwvK0VMOE45NFZCZEErb2dtdjVjNHVBcHcKRVFYbzFNeUFBTnd6RC85TmxHQTlvNWJnOVJwQzYxU0N5VUI2c3dJREFRQUJvRUV3UHdZSktvWklodmNOQVFrTwpNVEl3TURBaEJnTlZIUkVFR2pBWWdoWnBjM04xWlhJdGNuTmhMV1Y0WVcxd2JHVXVZMjl0TUFzR0ExVWREd1FFCkF3SUZvREFOQmdrcWhraUc5dzBCQVFzRkFBT0NBUUVBeXdza2FIZExRWWI3cFdiMXYvNDREd2JXMFp4Q2cyb1kKbmpSSk5oZ3VEVW43V3F3TlZDOXVnTWlWa0xvTmY3RDVmZ3dONFVNcWliR0NuYTY2WDlHeFJUS1pCNE5YTFE3Sgo4ZnV1cm1IaXF1Vm4vU0ZIbVRqL3NnMDFBVGJRWU9pWmxJUnhaS0oycHdzTjZwT3FNclU3NUplOFhZSEEyaTYvCkw0UGRSZHNVSkU4Z2doL0swb25mcEx6YUhBVmNPL25uS3M0NUFSVkl6MW84UVR3N0EzbnYrZ2RCZDc2YitzQWgKS1JZZUx1a0toMWYxSVBTblZqZmV5dGwwdFVJWktlQkcyWkN0emNuaVdlNXlhZ2xCYlB6aDJ2d3NVcDB1V1dobwo2ODg5d2VGUkE1R0Z6V3ZFWW1aK1VtUE5XZWNQRzNlMlI2blZoOVFTV2VseTNtbDdSZ0xWYXc9PQotLS0tLUVORCBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0K
  UID:       f5c14e33-830f-420e-86c1-4d6bce4197ef
  Username:  system:serviceaccount:cert-manager:cert-manager
Status:
  Ca:           LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURZekNDQWt1Z0F3SUJBZ0lSQUxKdllWT3V4UFFRTXp3Q3Y5ejNRT2N3RFFZSktvWklodmNOQVFFTEJRQXcKU3pFWE1CVUdBMVVFQ2d3T1FWZFRUMjVCYVhKVWQybDBZMmd4RnpBVkJnTlZCQXNNRGtGWFUwOXVRV2x5VkhkcApkR05vTVJjd0ZRWURWUVFEREE1QlYxTlBia0ZwY2xSM2FYUmphREFlRncweU16QTVNVEl4TnpNMk1qaGFGdzB6Ck16QTVNVEl4T0RNMk1qUmFNRXN4RnpBVkJnTlZCQW9NRGtGWFUwOXVRV2x5VkhkcGRHTm9NUmN3RlFZRFZRUUwKREE1QlYxTlBia0ZwY2xSM2FYUmphREVYTUJVR0ExVUVBd3dPUVZkVFQyNUJhWEpVZDJsMFkyZ3dnZ0VpTUEwRwpDU3FHU0liM0RRRUJBUVVBQTRJQkR3QXdnZ0VLQW9JQkFRQ3g0WmFHSUZscjR3ck5SUzhIRDEwWWRhblM3TDhWCjBEb3dieDQyQVpPUkZmd2hpRGZybml2b3ErZDJaWkZrdGxGSTVqdzhRN2I4dkppWnZQdklYNzRyQjJmanZjUjEKZ2NtL1NFSGMwUWRmWWxnT3FycE0yQjI2ZEk0YXpkRTBLODBXbUxXRTA3MDkrTks5UWE5Tkg1bk82ZU1nT2haRQphN1FldHBJUmUxcmlzc0ZqQktOY2prSkwzOUl5Z1dqVnRjeXV2Y2dZVlNnaWJOMjRrQkl0WXFkWTlCSXRwWUNDCmxSWjlhWWpkOEp4SXFQMzFIUWJmSVNhREdoaUNuM2pyWnVwdGp4M29oeUlDV3BhWmVDUnRqb2dGMkZzRkI2dFIKYWNTZTM4YVlBZDg3S1ZRNUsxcUROTU01NitSUlRZeFJURmIyaTVqWUMzSXkwL1Q5aEl6ZFlMbnhBZ01CQUFHagpRakJBTUE4R0ExVWRFd0VCL3dRRk1BTUJBZjh3SFFZRFZSME9CQllFRkV6WXdZSDI2UjR5UVBWL2QrcDcvYWdCCm1ZWnpNQTRHQTFVZER3RUIvd1FFQXdJQmhqQU5CZ2txaGtpRzl3MEJBUXNGQUFPQ0FRRUFiK2dSSnNaaW5GY1oKaWVQLzVWSXpBYzhVb0xmVmZrMzcyQkI4ZnZpRTluMDlxbEk2MU44ckhiMUJBeUQzVjNYTE4xSGRlVW90clFhZgpTcjNaVnJkZEFPbDlPaHJFQTBEWjlrYUZNL2p2ZXlIelJQTk1DYSttNnZvaGQycFk1RDUwOWFLT09lc3FmeERrClZJR1ZycXV3OWZhOXJTSWhMcDNaRkJaRTBXbDFPV3F3QzNjMGsxTDRtdHpDeGJ1eFVJQUJZZ0RWRU41M0RlNTUKdXpqUmlJWkE4TkFhb0RPajdMdDdKQmVFSCs4N0JZRmlCWjNMTWN3M1VOaG9UclpJUGs3NnlZVE5YeHRkdlgyOAoxTkZKYVdmSWRpRVJidEJlODYvbDhxVld0SU56RVYxdEhHZjVOSXY1cUJSUFlyQlptd09IdVNXUWFsZitDRUhLCnRORUVQSzBQc2c9PQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
  Certificate:  LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURZRENDQWtpZ0F3SUJBZ0lRRW52TmJrZ2dVV1FRU1Uxb2djKzZ2VEFOQmdrcWhraUc5dzBCQVFzRkFEQkwKTVJjd0ZRWURWUVFLREE1QlYxTlBia0ZwY2xSM2FYUmphREVYTUJVR0ExVUVDd3dPUVZkVFQyNUJhWEpVZDJsMApZMmd4RnpBVkJnTlZCQU1NRGtGWFUwOXVRV2x5VkhkcGRHTm9NQjRYRFRJek1UQXlOakl3TkRZek4xb1hEVEl6Ck1URXlOVEl4TkRZek4xb3dEakVNTUFvR0ExVUVDaE1EWVhkek1JSUJJakFOQmdrcWhraUc5dzBCQVFFRkFBT0MKQVE4QU1JSUJDZ0tDQVFFQTFlT29VWGxvWXVZNElxSitudlF0YkVKQ3BXZFpNRjB2N2k4dVAzd1RvQ0Zabzg0SApmaFkzOGFxb2h5ZWNjTmVGcXY0aXhFdzZvdkh2NmVCWE0zbTdkQk90bGo3bW5VeEZOTko3RHVtOTdEdG1UVEZFCkhmU05PREtndC9mbFJLWEtRQVFDak5YcUk0bWdqRDhzaFZVRFhzYUo1REFYajlMNlN6eUFnczF6ZXVCdlhibmQKZjFta0VxYTJuL2VQNklOakUybGNVeXZnb3hqeStDMm9nK2plN2NBU2RDNEo2K3QxclBwN09RTkVDWkRvY2gyMgpQa2dwV2xwTTl2V09Lb2Y1VXphQTBEblRLN25wRXQ0eVB6bXFOUFpsLytFTDhOOTRWQmRBK29nbXY1YzR1QXB3CkVRWG8xTXlBQU53ekQvOU5sR0E5bzViZzlScEM2MVNDeVVCNnN3SURBUUFCbzMwd2V6QWhCZ05WSFJFRUdqQVkKZ2hacGMzTjFaWEl0Y25OaExXVjRZVzF3YkdVdVkyOXRNQWtHQTFVZEV3UUNNQUF3SHdZRFZSMGpCQmd3Rm9BVQpUTmpCZ2ZicEhqSkE5WDkzNm52OXFBR1pobk13SFFZRFZSME9CQllFRkJHcHNSTjF4dEdiZFlYTnYvWm15RGNmCnBhTW5NQXNHQTFVZER3UUVBd0lGb0RBTkJna3Foa2lHOXcwQkFRc0ZBQU9DQVFFQUZPWGFORGhyNHZSZHQ4NnQKdlpCaGgvbXpXOVZPekRtTWpPYmdIenlOdGl6eTZIalhqVWprZVZWTWFHeEQxaHZzUmwxbTFocTZXaTI2RHpuSgpLbm9xUnJWUnFsVExMSE5ocWRxWXc4U3lRVExmL1VZVmFjNklqTWIvVVJ4eDYwVjBGbEJmUGpaTVFOOVc3Z3lpCklDeG5aaytiMzNNN1AxWitTWHNvU2hwek5TKzVSdGxubXRYYVZUdjkrM0plYkNCRlVVSzRoMUZnalFpOE5xRkMKVGJ6bkhoM1BDUGR6N3BVZWp6dHo2MnBOc1V3Q1hXOE5IQlRLZW01U1MvWEpLbDluU2x5Wm5HSWdrc2ZpSGxiWgptL2NxUnR2Wlk5amRKeHNYcVhKRHFzUWp5bGtqTlN6TTJSMTF2T3EvWFdwSU1UZFptc2szK2VFZVBJNnlIT3ZiClo1aTF6UT09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
  Conditions:
    Last Transition Time:  2023-10-26T21:46:37Z
    Message:               Certificate request has been approved by cert-manager.io
    Reason:                cert-manager.io
    Status:                True
    Type:                  Approved
    Last Transition Time:  2023-10-26T21:46:40Z
    Message:               certificate issued
    Reason:                Issued
    Status:                True
    Type:                  Ready
Events:
  Type    Reason           Age   From                                       Message
  ----    ------           ----  ----                                       -------
  Normal  cert-manager.io  21m   cert-manager-certificaterequests-approver  Certificate request has been approved by cert-manager.io
  Normal  Issued           21m   awspcaissuer-controller                    certificate issued
  1. Wait for the next retry in three hours and you'll see on the second retry that the Certificate object and the CertificateRequest object both show a successful issuance, grabbed the output here:
[ec2-user@ip-172-31-45-200 ~]$ kubectl describe Certificate
Name:         pca-issuer-rsa-cert-retry-blah-1
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  cert-manager.io/v1
Kind:         Certificate
Metadata:
  Creation Timestamp:  2023-10-26T20:46:37Z
  Generation:          1
  Resource Version:    1385854
  UID:                 65949e92-9dd4-47c5-885d-d654d4ffb963
Spec:
  Dns Names:
    issuer-rsa-example.com
  Issuer Ref:
    Group:  awspca.cert-manager.io
    Kind:   AWSPCAIssuer
    Name:   pca-issuer-rsa-2
  Private Key:
    Algorithm:  RSA
    Size:       2048
  Secret Name:  pca-issuer-rsa-cert-tls-retry-blah-1
  Subject:
    Organizations:
      aws
Status:
  Conditions:
    Last Transition Time:  2023-10-26T23:46:40Z
    Message:               Certificate is up to date and has not expired
    Observed Generation:   1
    Reason:                Ready
    Status:                True
    Type:                  Ready
  Not After:               2023-11-25T23:46:37Z
  Not Before:              2023-10-26T22:46:37Z
  Renewal Time:            2023-11-15T23:26:37Z
  Revision:                1
Events:                    <none>
[ec2-user@ip-172-31-45-200 ~]$ kubectl describe CertificateRequest
Name:         pca-issuer-rsa-cert-retry-blah-1-1
Namespace:    default
Labels:       <none>
Annotations:  cert-manager.io/certificate-name: pca-issuer-rsa-cert-retry-blah-1
              cert-manager.io/certificate-revision: 1
              cert-manager.io/private-key-secret-name: pca-issuer-rsa-cert-retry-blah-1-p8ssp
API Version:  cert-manager.io/v1
Kind:         CertificateRequest
Metadata:
  Creation Timestamp:  2023-10-26T23:46:37Z
  Generation:          1
  Owner References:
    API Version:           cert-manager.io/v1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  Certificate
    Name:                  pca-issuer-rsa-cert-retry-blah-1
    UID:                   65949e92-9dd4-47c5-885d-d654d4ffb963
  Resource Version:        1385848
  UID:                     a72811cd-15e9-4e1f-9302-eeadee8d4bb8
Spec:
  Extra:
    authentication.kubernetes.io/pod-name:
      cert-manager-75d57c8d4b-j5f6s
    authentication.kubernetes.io/pod-uid:
      fbc686ee-060a-4215-acbc-39fab0993079
  Groups:
    system:serviceaccounts
    system:serviceaccounts:cert-manager
    system:authenticated
  Issuer Ref:
    Group:   awspca.cert-manager.io
    Kind:    AWSPCAIssuer
    Name:    pca-issuer-rsa-2
  Request:   LS0tLS1CRUdJTiBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0KTUlJQ2xEQ0NBWHdDQVFBd0RqRU1NQW9HQTFVRUNoTURZWGR6TUlJQklqQU5CZ2txaGtpRzl3MEJBUUVGQUFPQwpBUThBTUlJQkNnS0NBUUVBM29MaG1BR0hxWlIrUUdZUnNIekpqRTFDRWppR2pqQ3FmcEk0aU9lSTA4T1JDUWJDCklqaCtnM0RtWm5KbUR5MVg5RENzancxWGExa2tVZHVrRzRFYUgzSTdQMFdXOVFMaTFZWHVubFV4cklFTkVaWVUKdGtTT1YweUtvWDMxYW4wcWkyQXRjZUVydFJ2bUF5b0loK2FwKzBRYTRvczlWeitrcGk4Y1ZPSEVMZW5UYXRHaApJUEdwdWpvVFVlZlFKQjRqbG1mUFByekxlNVVDNDlVcUc0dWxERjlEeWp3dUlqdmM0N1U2QzJ5TVJGNjI1eU5EClRsWTl2QlhGV2Z3MkJZZFBhUUdhajluVDBndU5pdmczOEp1WFhQL0x4NGQzeGRDRUo4SVBkOENlcWdkYzNZR2sKRmVYUEVSUmQwSzhGWGVHNnlJd3IwcHdpcWxyL0p4RTlYWVV1MHdJREFRQUJvRUV3UHdZSktvWklodmNOQVFrTwpNVEl3TURBaEJnTlZIUkVFR2pBWWdoWnBjM04xWlhJdGNuTmhMV1Y0WVcxd2JHVXVZMjl0TUFzR0ExVWREd1FFCkF3SUZvREFOQmdrcWhraUc5dzBCQVFzRkFBT0NBUUVBblRGZkNkTzEyQmFGaUkycG84OVhtV0drN3RFSmk2VjIKUVQrQ1k2emVnODVyS2VaZmVDM3BXOXNmNnBRQ3ZOUlZsK096S0xXbU16OW9VbStWYjk2NTVFQlp3UFF4WnlaTQp1Zkd6WnBvQWpYZC95K2tDVVZ1ODNLdm5QVVFYcWUvdW4vRlM3Wm42S2hCS0VwWjl5aU1KRjFIbUdCSHFIV1luCk1KNE9VbGRYelhOa0dLQjZseCsrbzdrN1JXTStsZVJOQzRwbWkycElJck1NMmpqdEdmY1R3ek1PblBKNUxLME0KQjlLSFMwc2hKdlBoTzlHcC8xUldodzFEaTBzT3pOS1Rqc3N4TXQ2Y0o5YUNoVkh5NW1BNnpNaW1JTTlkcnpJdQpIcS9TYnZnK085cm5WUHIrS0k1bG5ZVVBEYndtRVFxN2RvdytTbk9KaUNsNUlvbjdEYzk5dkE9PQotLS0tLUVORCBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0K
  UID:       f5c14e33-830f-420e-86c1-4d6bce4197ef
  Username:  system:serviceaccount:cert-manager:cert-manager
Status:
  Ca:           LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURZekNDQWt1Z0F3SUJBZ0lSQUxKdllWT3V4UFFRTXp3Q3Y5ejNRT2N3RFFZSktvWklodmNOQVFFTEJRQXcKU3pFWE1CVUdBMVVFQ2d3T1FWZFRUMjVCYVhKVWQybDBZMmd4RnpBVkJnTlZCQXNNRGtGWFUwOXVRV2x5VkhkcApkR05vTVJjd0ZRWURWUVFEREE1QlYxTlBia0ZwY2xSM2FYUmphREFlRncweU16QTVNVEl4TnpNMk1qaGFGdzB6Ck16QTVNVEl4T0RNMk1qUmFNRXN4RnpBVkJnTlZCQW9NRGtGWFUwOXVRV2x5VkhkcGRHTm9NUmN3RlFZRFZRUUwKREE1QlYxTlBia0ZwY2xSM2FYUmphREVYTUJVR0ExVUVBd3dPUVZkVFQyNUJhWEpVZDJsMFkyZ3dnZ0VpTUEwRwpDU3FHU0liM0RRRUJBUVVBQTRJQkR3QXdnZ0VLQW9JQkFRQ3g0WmFHSUZscjR3ck5SUzhIRDEwWWRhblM3TDhWCjBEb3dieDQyQVpPUkZmd2hpRGZybml2b3ErZDJaWkZrdGxGSTVqdzhRN2I4dkppWnZQdklYNzRyQjJmanZjUjEKZ2NtL1NFSGMwUWRmWWxnT3FycE0yQjI2ZEk0YXpkRTBLODBXbUxXRTA3MDkrTks5UWE5Tkg1bk82ZU1nT2haRQphN1FldHBJUmUxcmlzc0ZqQktOY2prSkwzOUl5Z1dqVnRjeXV2Y2dZVlNnaWJOMjRrQkl0WXFkWTlCSXRwWUNDCmxSWjlhWWpkOEp4SXFQMzFIUWJmSVNhREdoaUNuM2pyWnVwdGp4M29oeUlDV3BhWmVDUnRqb2dGMkZzRkI2dFIKYWNTZTM4YVlBZDg3S1ZRNUsxcUROTU01NitSUlRZeFJURmIyaTVqWUMzSXkwL1Q5aEl6ZFlMbnhBZ01CQUFHagpRakJBTUE4R0ExVWRFd0VCL3dRRk1BTUJBZjh3SFFZRFZSME9CQllFRkV6WXdZSDI2UjR5UVBWL2QrcDcvYWdCCm1ZWnpNQTRHQTFVZER3RUIvd1FFQXdJQmhqQU5CZ2txaGtpRzl3MEJBUXNGQUFPQ0FRRUFiK2dSSnNaaW5GY1oKaWVQLzVWSXpBYzhVb0xmVmZrMzcyQkI4ZnZpRTluMDlxbEk2MU44ckhiMUJBeUQzVjNYTE4xSGRlVW90clFhZgpTcjNaVnJkZEFPbDlPaHJFQTBEWjlrYUZNL2p2ZXlIelJQTk1DYSttNnZvaGQycFk1RDUwOWFLT09lc3FmeERrClZJR1ZycXV3OWZhOXJTSWhMcDNaRkJaRTBXbDFPV3F3QzNjMGsxTDRtdHpDeGJ1eFVJQUJZZ0RWRU41M0RlNTUKdXpqUmlJWkE4TkFhb0RPajdMdDdKQmVFSCs4N0JZRmlCWjNMTWN3M1VOaG9UclpJUGs3NnlZVE5YeHRkdlgyOAoxTkZKYVdmSWRpRVJidEJlODYvbDhxVld0SU56RVYxdEhHZjVOSXY1cUJSUFlyQlptd09IdVNXUWFsZitDRUhLCnRORUVQSzBQc2c9PQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
  Certificate:  LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURZRENDQWtpZ0F3SUJBZ0lRYlJmUUJFbGhwcTRZUzMwK0RJb2ZLakFOQmdrcWhraUc5dzBCQVFzRkFEQkwKTVJjd0ZRWURWUVFLREE1QlYxTlBia0ZwY2xSM2FYUmphREVYTUJVR0ExVUVDd3dPUVZkVFQyNUJhWEpVZDJsMApZMmd4RnpBVkJnTlZCQU1NRGtGWFUwOXVRV2x5VkhkcGRHTm9NQjRYRFRJek1UQXlOakl5TkRZek4xb1hEVEl6Ck1URXlOVEl6TkRZek4xb3dEakVNTUFvR0ExVUVDaE1EWVhkek1JSUJJakFOQmdrcWhraUc5dzBCQVFFRkFBT0MKQVE4QU1JSUJDZ0tDQVFFQTNvTGhtQUdIcVpSK1FHWVJzSHpKakUxQ0VqaUdqakNxZnBJNGlPZUkwOE9SQ1FiQwpJamgrZzNEbVpuSm1EeTFYOURDc2p3MVhhMWtrVWR1a0c0RWFIM0k3UDBXVzlRTGkxWVh1bmxVeHJJRU5FWllVCnRrU09WMHlLb1gzMWFuMHFpMkF0Y2VFcnRSdm1BeW9JaCthcCswUWE0b3M5Vnora3BpOGNWT0hFTGVuVGF0R2gKSVBHcHVqb1RVZWZRSkI0amxtZlBQcnpMZTVVQzQ5VXFHNHVsREY5RHlqd3VJanZjNDdVNkMyeU1SRjYyNXlORApUbFk5dkJYRldmdzJCWWRQYVFHYWo5blQwZ3VOaXZnMzhKdVhYUC9MeDRkM3hkQ0VKOElQZDhDZXFnZGMzWUdrCkZlWFBFUlJkMEs4RlhlRzZ5SXdyMHB3aXFsci9KeEU5WFlVdTB3SURBUUFCbzMwd2V6QWhCZ05WSFJFRUdqQVkKZ2hacGMzTjFaWEl0Y25OaExXVjRZVzF3YkdVdVkyOXRNQWtHQTFVZEV3UUNNQUF3SHdZRFZSMGpCQmd3Rm9BVQpUTmpCZ2ZicEhqSkE5WDkzNm52OXFBR1pobk13SFFZRFZSME9CQllFRlBuS1NUMkFwSnVBb1RZb3lTVzk2MjZkCmpBL1NNQXNHQTFVZER3UUVBd0lGb0RBTkJna3Foa2lHOXcwQkFRc0ZBQU9DQVFFQVNVZEtLcUhvRHM2RnAvR2oKeVcvTGhxcFIzeVh5L1hBc0FKODRGQjZHVkhYcGpKMlBXblhRTHZwVVZrdHp3Yi8wTWtvSGloTWdvQi9vQksxNApHNm5mSkNtaWdoa3lEaTVaZGorN0UzdlVSVW1oYm12bDVXbWN0QitIa1Roa0M2L1NwTldRc0N1aDdGVDVIRXllClFvTlZ2amtyMXk5U0JqbzJTSi9JMXp6aGJvc2lldWhEQnZ3UDUvT3hORWtncFJzTTZ4SVBROTd2ZDZ0RnA5RFAKNUVEZjE5WEJRSjlQSm5Kdk83b2xoYXlGY0t4cEExampoSEFDaE00K2MwUXZ2SXlUVUlVZFJpMmtRZlpBaWZoMwpBWXhPWjNDMEFDMVQ5Qy9FS2RvZEFBTlF6ajU4b1FaUHl1K0ZjRmxKdzIyYVk5bVVZdFA0MWhUbXJ4VjdpTEM5CkE4eU5jdz09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
  Conditions:
    Last Transition Time:  2023-10-26T23:46:37Z
    Message:               Certificate request has been approved by cert-manager.io
    Reason:                cert-manager.io
    Status:                True
    Type:                  Approved
    Last Transition Time:  2023-10-26T23:46:40Z
    Message:               certificate issued
    Reason:                Issued
    Status:                True
    Type:                  Ready
Events:                    <none>

To further summarize: When a Certificate is requested before an Issuer is created, the Certificate will fail to issue. When the issuer is created, the Certificate will issue on the next retry (1 hour), but the Certificate object isn't being updated correctly to reflect that even though the CertificateRequest object shows a success. On the next retry (3 hours), the Certificate will also issue again and the Certificate object will also be updated to show that the certificate was issued successfully, so at this point, 2 certificates were issued, and everything is stable.

This is clearly the wrong behavior, and the correct behavior is that on the first retry, when the certificate was issued, the Certificate object should also be updated to reflect the successful issuance. That seems to be the root of the problem that needs to be addressed. I am not yet sure on where to address that issue - it seems that the cert-manager ReadinessController is responsible for taking a CertificateRequest that is successful and updating the Certificate object, so I am looking there: https://github.com/cert-manager/cert-manager/blob/master/pkg/controller/certificates/readiness/readiness_controller.go

woehrl01 commented 10 months ago

@divyansh-gupta thank you for the update. There must be definitely something wrong somewhere. If you look at my event log, you can clearly see that it retried multiple times and never succeed, without deleting the certificaterequest.

Despite I think one hour delay because of a "race condition" during setup is quite long. (in hour case it's not me issue of a deleter issuer but, just that the controller is somehow in a not full ready state (it fails that the controller can't receive the provisioner, even though it did exist the whole time.

divyansh-gupta commented 10 months ago

There seem to be two distinct problems:

  1. Upon a successful retry after an initial failure, it seems that Certificate objects are not always being marked as successfully issued the first time. This is out of the scope of this PR, and may be an issue with cert-manager's ReadinessController. We will leave this issue out of the scope of this discussion for now and will be looked into seperately.

  2. Our plugin is not retrying quickly enough when common "temporary" errors occur. This PR addresses that.

Thanks for finding these issues and more importanly creating a PR to help!

We will be accepting this PR, however there are two things that I'd like to consider as changes in this PR:

  1. We should default MaxRetryDuration to 180. That should allow for 2 retries, considering each retry is set to occur between 1-2 mins.

  2. We should not retry errors returned from the provisioner.sign method. Those should be rare and retry slowly/permanently fail, but I believe this PR marks those errors as temporary.

Let me know what you think of the suggestions above! Thanks for your help again!

github-actions[bot] commented 10 months ago

Detected different CRDs in the config/crd/bases directory and charts/aws-pca-issuer/crds directory. Since both CRDs were modified in this commit(s), they were unable to be automatically synced. Please update the pull request with identical CRDs for this workflow to pass.

woehrl01 commented 10 months ago

@divyansh-gupta Thank you for your feedback, and I agree with your points. I just updated a PR which reflects your suggestions.

Please reach out to me if you think additional changes are needed.

divyansh-gupta commented 10 months ago

Hi @woehrl01 - thank you! Noting the bot here:

Detected different CRDs in the config/crd/bases directory and charts/aws-pca-issuer/crds directory.
Since both CRDs were modified in this commit(s), they were unable to be automatically synced. Please update the pull request with identical CRDs for this workflow to pass.

Could you please make sure this is synced up? Thank you!

woehrl01 commented 10 months ago

@divyansh-gupta Just pushed so that there are no additional diffs of the CRDs

divyansh-gupta commented 10 months ago
failure-provisioner-not-found-temporary

This test is failing when run in our automated tests. I am assuming this is because the helm chart file change is not being pulled into our testing.

https://github.com/cert-manager/aws-privateca-issuer/actions/runs/6864833818/job/18667570653#step:15:345

woehrl01 commented 10 months ago

@divyansh-gupta thank you, looks like we need to "stabilize" the jitter for the tests. Just did that.

divyansh-gupta commented 10 months ago

Thank you! It looks like the automated testing is continuing to fail: https://github.com/cert-manager/aws-privateca-issuer/actions/runs/6893208888/job/18752298060#step:15:802.

woehrl01 commented 7 months ago

@divyansh-gupta thank you for your patience. I would like to confirm that this change fixed out issue in the past. I'll have a look at the failing tests and update the PR.

jetstack-bot commented 6 months ago

PR needs rebase.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
cunningr commented 4 months ago

Hi! Is this still being worked on?

woehrl01 commented 4 months ago

@cunningr Not sure what you mean with being worked on. It works like this smooth in production.

cunningr commented 4 months ago

@cunningr Not sure what you mean with being worked on. It works like this smooth in production.

Apologies - I was looking for a main branch release where I could pick up this fix.

woehrl01 commented 2 months ago

@bmsiegel tagging you, because you are contributing lately. Is there anything missing to get this merged?

bmsiegel commented 2 months ago

It looks like we need a successful test run, will start that now. It also looks like you'll need a rebase.

bmsiegel commented 2 months ago

https://github.com/cert-manager/aws-privateca-issuer/actions/runs/9666406821/job/26665886635

Testing failed here. I believe this is due to a missing rebase tho.

cert-manager-prow[bot] commented 2 months ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign sgtcodfish for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/cert-manager/aws-privateca-issuer/blob/main/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
woehrl01 commented 2 months ago

@bmsiegel thanks. Just pushed a rebased version. Can you please rerun the tests?

bmsiegel commented 2 months ago

You have a typo in your commit message. I can run the testing but will need to fix that before we merge.

woehrl01 commented 2 months ago

Thanks, I will do a squash if we are ready to merge

bmsiegel commented 2 months ago

Great sounds good. Running tests now.

bmsiegel commented 2 months ago

https://github.com/cert-manager/aws-privateca-issuer/actions/runs/9684323999/job/26721868900

woehrl01 commented 2 months ago

@bmsiegel Do you have any hint, what could be the root cause for the failing tests. I'm looking since a while at my changes and couldn't find any relation why there would be no condition at all set?

bmsiegel commented 2 months ago
2024-06-26T18:04:11.124744179Z stderr F invalid value "180" for flag -max-retry-duration: parse error
2024-06-26T18:04:11.12488991Z stderr F Usage of /manager:
2024-06-26T18:04:11.124985356Z stderr F   -disable-approved-check
2024-06-26T18:04:11.12499068Z stderr F      Disables waiting for CertificateRequests to have an approved condition before signing.
2024-06-26T18:04:11.125041441Z stderr F   -health-probe-bind-address string
2024-06-26T18:04:11.125045097Z stderr F         The address the probe endpoint binds to. (default ":8081")
2024-06-26T18:04:11.125096734Z stderr F   -kubeconfig string
2024-06-26T18:04:11.12510018Z stderr F      Paths to a kubeconfig. Only required if out-of-cluster.
2024-06-26T18:04:11.125144205Z stderr F   -leader-elect
2024-06-26T18:04:11.125147624Z stderr F         Enable leader election for controller manager. Enabling this will ensure there is only one active controller manager.
2024-06-26T18:04:11.125258051Z stderr F   -max-retry-duration duration
2024-06-26T18:04:11.1252624Z stderr F       Maximum duration to retry failed CertificateRequests. Set to 0 to disable. (default 180ns)
2024-06-26T18:04:11.125310172Z stderr F   -metrics-bind-address string
2024-06-26T18:04:11.125313392Z stderr F         The address the metric endpoint binds to. (default ":8080")
2024-06-26T18:04:11.125362966Z stderr F   -zap-devel
2024-06-26T18:04:11.125366635Z stderr F         Development Mode defaults(encoder=consoleEncoder,logLevel=Debug,stackTraceLevel=Warn). Production Mode defaults(encoder=jsonEncoder,logLevel=Info,stackTraceLevel=Error)
2024-06-26T18:04:11.125413951Z stderr F   -zap-encoder value
2024-06-26T18:04:11.125417062Z stderr F         Zap log encoding (one of 'json' or 'console')
2024-06-26T18:04:11.125475648Z stderr F   -zap-log-level value
2024-06-26T18:04:11.125479139Z stderr F         Zap Level to configure the verbosity of logging. Can be one of 'debug', 'info', 'error', or any integer value > 0 which corresponds to custom debug levels of increasing verbosity
2024-06-26T18:04:11.125521717Z stderr F   -zap-stacktrace-level value
2024-06-26T18:04:11.125524926Z stderr F         Zap Level at and above which stacktraces are captured (one of 'info', 'error', 'panic').
2024-06-26T18:04:11.1255661Z stderr F   -zap-time-encoding value
2024-06-26T18:04:11.125569402Z stderr F         Zap time encoding (one of 'epoch', 'millis', 'nano', 'iso8601', 'rfc3339' or 'rfc3339nano'). Defaults to 'epoch'.

We currently copy our kind logs to s3, pulled this from there

woehrl01 commented 2 months ago

@bmsiegel thank you! Just pushed the right fix

bmsiegel commented 2 months ago
> kubectl logs aws-pca-issuer-1720444904-aws-privateca-issuer-6b8d97946-87n4z
flag provided but not defined: -max-retry-duration
Usage of /manager:
  -disable-approved-check
        Disables waiting for CertificateRequests to have an approved condition before signing.
  -health-probe-bind-address string
        The address the probe endpoint binds to. (default ":8081")
  -kubeconfig string
        Paths to a kubeconfig. Only required if out-of-cluster.
  -leader-elect
        Enable leader election for controller manager. Enabling this will ensure there is only one active controller manager.
  -metrics-bind-address string
        The address the metric endpoint binds to. (default ":8080")
  -zap-devel
        Development Mode defaults(encoder=consoleEncoder,logLevel=Debug,stackTraceLevel=Warn). Production Mode defaults(encoder=jsonEncoder,logLevel=Info,stackTraceLevel=Error)
  -zap-encoder value
        Zap log encoding (one of 'json' or 'console')
  -zap-log-level value
        Zap Level to configure the verbosity of logging. Can be one of 'debug', 'info', 'error', or any integer value > 0 which corresponds to custom debug levels of increasing verbosity
  -zap-stacktrace-level value
        Zap Level at and above which stacktraces are captured (one of 'info', 'error', 'panic').
  -zap-time-encoding value
        Zap time encoding (one of 'epoch', 'millis', 'nano', 'iso8601', 'rfc3339' or 'rfc3339nano'). Defaults to 'epoch'.

Since our testing uses our prod ecr version, we end up facing this error from the helm test.

woehrl01 commented 2 months ago

@bmsiegel Thank you for sharing the output. I don't exactly understand what you mean. Are the e2e tests using the prod ECR, which means that the helmchart should be modified to be backwards compatible? (e.g. removing the flag if it isn't set?)

bmsiegel commented 2 months ago

Yes, if there's a way we can make it backwards compatible that is preferable

woehrl01 commented 1 month ago

@bmsiegel I removed the default value from the helm chart, now it should be compatible with the old image as well.

woehrl01 commented 1 month ago

@bmsiegel looks like the tests are successful now. If you there are no additional changes required from your side I will do the squashing of the changes. Wdyt?

bmsiegel commented 1 month ago

Only thing I'd ask is that we remove everything except the necessary changes for the PR. You can cut new PRs for the other things you did here or we can take care of it. Would make the history a little easier to follow.