Open woehrl01 opened 11 months ago
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign sgtcodfish for approval. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Looking at the implementation at: https://github.com/cert-manager/issuer-lib/blob/main/controllers/request_controller.go#L306 I can see that it should be handled differently. I'm replacing the PR with an approach like that: It records the errors as intermediate errors including a maxretryduration.
PR is now updated to reflect a similar behaviour as in the https://github.com/cert-manager/issuer-lib, by allowing a grace period of retries.
Detected different CRDs in the config/crd/bases
directory and charts/aws-pca-issuer/crds
directory.
Since both CRDs were modified in this commit(s), they were unable to be automatically synced. Please update the pull request with identical CRDs for this workflow to pass.
@woehrl01 cert-manager should retry failed issuance per their documentation (https://cert-manager.io/docs/faq/). Do you know if cert-manager is retrying? If it's not, any idea why it's not working as intended?
@dcamzn No I don't know. According to the docs it's only retrying if it's not in a terminal state (marked as failed). I also looked through the source code and it looks like that it only retries a terminal state if you delete the certificaterequest (that's also a recommendation according to the faq). In my case we only have a window of 32h until the certificate is invalid and it had never retried/recreated a certificaterequest once during that time, after it failed.
Even an active "renew" command via cli isn't creating a new certificaterequest 48h after terminal failure.
It could be because we never set the condition Issuing
to false. But Ready
to false
@woehrl01 thank you for that info. we're going to keep looking into this. We'll take a look at the Issuing and Ready conditions that you called out. Thank you!
@dcamzn thanks. One tip to reproduce the issue quite easily:
Expected scenario is that it would retry issuing the certificate afterwards automatically/eventually.
My PR will basically just allow this for a configurable grace period. We already use that change successful in production.
thanks @woehrl01 on steps to reproduce the issue. you read our minds!
If it fails at the point of the certificaterequest, we think the right behavior is for the certificaterequest to be deleted by the cert-manager controller and recreated by the cert-manager controller, so that all issuers get fixed.
we'll circle back after reproducing the issue.
@dcamzn great, thank you! Let me share the following docs about the certificaterequest which states that it's not retrying the issuing after failure: https://cert-manager.io/docs/concepts/certificaterequest/
So I guess the Faq is maybe not correct.
Did some additional research, looks like that any of those conditions do not match, and the CertificateRequest
is not getting deleted:
I was able to reproduce and cut this SIM to cert-manager: https://github.com/cert-manager/cert-manager/issues/6405.
I saw that one of the times I reproduced, if I waited long enough, it did retry exactly once. I was not able to get that behavior consistently.
I was able to get retries to work, see https://github.com/cert-manager/cert-manager/issues/6405 for details. It seems the retries just take longer between each try than they did before, which was a change in cert-manager 1.8.
Thank you @divyansh-gupta for investigating, I also have read through the PR, but this is not what I see.
I would like to share the exact logs which shows that cert-manager is retrying, but it's not creating a new CertificateRequest
resource.
The "failed to retrieve provisioner" is the error returned by aws-privateca-issuer
@dcamzn @divyansh-gupta what are your thought on this PR, after syncing with the cert-manager team?
We are also hitting a very similar issue. If you create the cert and the AWSPCAIssuer at the same time, the Certificate request happens before the AWSPCAIssuer is ready. Although cert-manager seems to retry after 1h, it is never successful in creating the certificate.
Hi @woehrl01 - let me share with your my results when replicating the exact situation you described:
[ec2-user@ip-172-31-45-200 ~]$ kubectl describe Certificate
Name: pca-issuer-rsa-cert-retry-blah-1
Namespace: default
Labels: <none>
Annotations: <none>
API Version: cert-manager.io/v1
Kind: Certificate
Metadata:
Creation Timestamp: 2023-10-26T20:46:37Z
Generation: 1
Resource Version: 1364963
UID: 65949e92-9dd4-47c5-885d-d654d4ffb963
Spec:
Dns Names:
issuer-rsa-example.com
Issuer Ref:
Group: awspca.cert-manager.io
Kind: AWSPCAIssuer
Name: pca-issuer-rsa-2
Private Key:
Algorithm: RSA
Size: 2048
Secret Name: pca-issuer-rsa-cert-tls-retry-blah-1
Subject:
Organizations:
aws
Status:
Conditions:
Last Transition Time: 2023-10-26T20:46:37Z
Message: Issuing certificate as Secret does not exist
Observed Generation: 1
Reason: DoesNotExist
Status: False
Type: Ready
Last Transition Time: 2023-10-26T20:46:37Z
Message: The certificate request has failed to complete and will be retried: issuer could not be found
Observed Generation: 1
Reason: Failed
Status: False
Type: Issuing
Failed Issuance Attempts: 1
Last Failure Time: 2023-10-26T20:46:37Z
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Issuing 27s cert-manager-certificates-trigger Issuing certificate as Secret does not exist
Normal Generated 27s cert-manager-certificates-key-manager Stored new private key in temporary Secret resource "pca-issuer-rsa-cert-retry-blah-1-nfkg7"
Normal Requested 27s cert-manager-certificates-request-manager Created new CertificateRequest resource "pca-issuer-rsa-cert-retry-blah-1-1"
Warning Failed 27s cert-manager-certificates-issuing The certificate request has failed to complete and will be retried: issuer could not be found
[ec2-user@ip-172-31-45-200 ~]$ kubectl describe CertificateRequest
Name: pca-issuer-rsa-cert-retry-blah-1-1
Namespace: default
Labels: <none>
Annotations: cert-manager.io/certificate-name: pca-issuer-rsa-cert-retry-blah-1
cert-manager.io/certificate-revision: 1
cert-manager.io/private-key-secret-name: pca-issuer-rsa-cert-retry-blah-1-nfkg7
API Version: cert-manager.io/v1
Kind: CertificateRequest
Metadata:
Creation Timestamp: 2023-10-26T20:46:37Z
Generation: 1
Owner References:
API Version: cert-manager.io/v1
Block Owner Deletion: true
Controller: true
Kind: Certificate
Name: pca-issuer-rsa-cert-retry-blah-1
UID: 65949e92-9dd4-47c5-885d-d654d4ffb963
Resource Version: 1364959
UID: 58a242be-e3ae-488e-81bc-1c20289c4838
Spec:
Extra:
authentication.kubernetes.io/pod-name:
cert-manager-75d57c8d4b-j5f6s
authentication.kubernetes.io/pod-uid:
fbc686ee-060a-4215-acbc-39fab0993079
Groups:
system:serviceaccounts
system:serviceaccounts:cert-manager
system:authenticated
Issuer Ref:
Group: awspca.cert-manager.io
Kind: AWSPCAIssuer
Name: pca-issuer-rsa-2
Request: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0KTUlJQ2xEQ0NBWHdDQVFBd0RqRU1NQW9HQTFVRUNoTURZWGR6TUlJQklqQU5CZ2txaGtpRzl3MEJBUUVGQUFPQwpBUThBTUlJQkNnS0NBUUVBMit5MGcrQzRMdzhwK1VqUlpPSW8zZ0lUQ1BHQ1RHRVBKeUpsL1FQNDdrdHJuZld4ClFyTTMrSUFRdkY4dkNSSTI2dlVBWXo0bnJNeUppbXpsK2FHQ2JPeE8wbmpaRXhjM0RnYUFRYkNHN1ZYQ3NXUGEKUnp6NUpDUCtPSmdFT3MzT1drT0lWdXJ5UE95TEpnVmdoUHR2SURTcGJNY0ozQzNKVmtoSkJQNTVVdkdPeFBrWgpDRUhpaXlsbVdXNEVzMnYrUjRlOGtxUW5kaWJLa1NtNk4wR2g0Ull1N1ppcGpPK1A0LzNYTkNZWFN4Z3JISVRPCjg5TVNYdFpCbENhV1FmR3doZnFCZnJnc1FCdDhwa2w1OVlSTTRadW41aEJUd0IzRFYwUGVGdm80VEZBcWFTbzgKZXU0MEJpMXJMUmszejFVMnBtaWVScWtweVA1aGZzVGVkNEZ2ZlFJREFRQUJvRUV3UHdZSktvWklodmNOQVFrTwpNVEl3TURBaEJnTlZIUkVFR2pBWWdoWnBjM04xWlhJdGNuTmhMV1Y0WVcxd2JHVXVZMjl0TUFzR0ExVWREd1FFCkF3SUZvREFOQmdrcWhraUc5dzBCQVFzRkFBT0NBUUVBbVJwd2Fjbm9naFRVZ2Izc0xJY0FtMmttQXNTRWU0N1AKeE44cmtHRnY4YUdzSmpJaFYyb09QdjkrWkIzNmlEcy9KVmRCaGFwZlpvNTBBNW9DK2hNRzUrS2FlUUlrTW1pVApTUjhGVFVSak1IUXFQSCttenlqNVoxNXErYUhERk1NcXRxeml4WlFDSnJ4V09rbFVIK3RPelR5TytFTmg4bW0vCit0SHFqdnY3clA4eWlKY1JlUVlSMVErQndGTCs0b0NJcjFnYlB5bmNuVjBsUkhCK0NpOFVHWHN4bm1IOG9xa1YKdTkyaU5rUHZ0dTk0NGNqRTVva1hWd1NHc0dDZ1ZBeGNZd0NnbEpNaW9ESmd6NzRBRW5uMlVGcnZXelVpM2lMdApRQUxuOFVWeXYweUF0Tjhsc3V4aGlCQUlJVzUyNENHaCt6MGxsRzVGSWVpZ1E0a1lxc1BueGc9PQotLS0tLUVORCBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0K
UID: f5c14e33-830f-420e-86c1-4d6bce4197ef
Username: system:serviceaccount:cert-manager:cert-manager
Status:
Conditions:
Last Transition Time: 2023-10-26T20:46:37Z
Message: Certificate request has been approved by cert-manager.io
Reason: cert-manager.io
Status: True
Type: Approved
Last Transition Time: 2023-10-26T20:46:37Z
Message: issuer could not be found
Reason: Failed
Status: False
Type: Ready
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal cert-manager.io 87s cert-manager-certificaterequests-approver Certificate request has been approved by cert-manager.io
Warning Failed 87s awspcaissuer-controller issuer could not be found
[ec2-user@ip-172-31-45-200 ~]$ kubectl describe AWSPCAIssuer
Name: pca-issuer-rsa-2
Namespace: default
Labels: <none>
Annotations: <none>
API Version: awspca.cert-manager.io/v1beta1
Kind: AWSPCAIssuer
Metadata:
Creation Timestamp: 2023-10-26T20:48:42Z
Generation: 1
Resource Version: 1365212
UID: cb4cb08a-e0c3-4ae5-b7f3-9925ed49428b
Spec:
Arn: arn:aws:acm-pca:us-east-1:444007060692:certificate-authority/da6896c0-cd44-41e9-b436-c291c2018b16
Region: us-east-1
Status:
Conditions:
Last Transition Time: 2023-10-26T20:48:46Z
Message: Issuer verified
Reason: Verified
Status: True
Type: Ready
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Verified 1s (x2 over 4s) awspcaissuer-controller Issuer verified
Wait until the 1 hour mark so retry happens
See the certificate failed the second retry -
[ec2-user@ip-172-31-45-200 ~]$ kubectl describe Certificate
Name: pca-issuer-rsa-cert-retry-blah-1
Namespace: default
Labels: <none>
Annotations: <none>
API Version: cert-manager.io/v1
Kind: Certificate
Metadata:
Creation Timestamp: 2023-10-26T20:46:37Z
Generation: 1
Resource Version: 1371932
UID: 65949e92-9dd4-47c5-885d-d654d4ffb963
Spec:
Dns Names:
issuer-rsa-example.com
Issuer Ref:
Group: awspca.cert-manager.io
Kind: AWSPCAIssuer
Name: pca-issuer-rsa-2
Private Key:
Algorithm: RSA
Size: 2048
Secret Name: pca-issuer-rsa-cert-tls-retry-blah-1
Subject:
Organizations:
aws
Status:
Conditions:
Last Transition Time: 2023-10-26T20:46:37Z
Message: Issuing certificate as Secret does not exist
Observed Generation: 1
Reason: DoesNotExist
Status: False
Type: Ready
Last Transition Time: 2023-10-26T21:46:37Z
Message: The certificate request has failed to complete and will be retried: issuer could not be found
Observed Generation: 1
Reason: Failed
Status: False
Type: Issuing
Failed Issuance Attempts: 2
Last Failure Time: 2023-10-26T21:46:37Z
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Issuing 22m (x2 over 82m) cert-manager-certificates-trigger Issuing certificate as Secret does not exist
Normal Requested 22m (x2 over 82m) cert-manager-certificates-request-manager Created new CertificateRequest resource "pca-issuer-rsa-cert-retry-blah-1-1"
Warning Failed 22m (x2 over 82m) cert-manager-certificates-issuing The certificate request has failed to complete and will be retried: issuer could not be found
Normal Generated 22m cert-manager-certificates-key-manager Stored new private key in temporary Secret resource "pca-issuer-rsa-cert-retry-blah-1-vfczn"
but see the CertificateRequest shows the cert issued anyway! And I checked, the cert exists and was issued -
[ec2-user@ip-172-31-45-200 ~]$ kubectl describe CertificateRequest
Name: pca-issuer-rsa-cert-retry-blah-1-1
Namespace: default
Labels: <none>
Annotations: cert-manager.io/certificate-name: pca-issuer-rsa-cert-retry-blah-1
cert-manager.io/certificate-revision: 1
cert-manager.io/private-key-secret-name: pca-issuer-rsa-cert-retry-blah-1-vfczn
API Version: cert-manager.io/v1
Kind: CertificateRequest
Metadata:
Creation Timestamp: 2023-10-26T21:46:37Z
Generation: 1
Owner References:
API Version: cert-manager.io/v1
Block Owner Deletion: true
Controller: true
Kind: Certificate
Name: pca-issuer-rsa-cert-retry-blah-1
UID: 65949e92-9dd4-47c5-885d-d654d4ffb963
Resource Version: 1371939
UID: a416b3e9-589a-4f53-9b53-78498fdaf6cb
Spec:
Extra:
authentication.kubernetes.io/pod-name:
cert-manager-75d57c8d4b-j5f6s
authentication.kubernetes.io/pod-uid:
fbc686ee-060a-4215-acbc-39fab0993079
Groups:
system:serviceaccounts
system:serviceaccounts:cert-manager
system:authenticated
Issuer Ref:
Group: awspca.cert-manager.io
Kind: AWSPCAIssuer
Name: pca-issuer-rsa-2
Request: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0KTUlJQ2xEQ0NBWHdDQVFBd0RqRU1NQW9HQTFVRUNoTURZWGR6TUlJQklqQU5CZ2txaGtpRzl3MEJBUUVGQUFPQwpBUThBTUlJQkNnS0NBUUVBMWVPb1VYbG9ZdVk0SXFKK252UXRiRUpDcFdkWk1GMHY3aTh1UDN3VG9DRlpvODRICmZoWTM4YXFvaHllY2NOZUZxdjRpeEV3Nm92SHY2ZUJYTTNtN2RCT3RsajdtblV4Rk5OSjdEdW05N0R0bVRURkUKSGZTTk9ES2d0L2ZsUktYS1FBUUNqTlhxSTRtZ2pEOHNoVlVEWHNhSjVEQVhqOUw2U3p5QWdzMXpldUJ2WGJuZApmMW1rRXFhMm4vZVA2SU5qRTJsY1V5dmdveGp5K0Myb2cramU3Y0FTZEM0SjYrdDFyUHA3T1FORUNaRG9jaDIyClBrZ3BXbHBNOXZXT0tvZjVVemFBMERuVEs3bnBFdDR5UHptcU5QWmwvK0VMOE45NFZCZEErb2dtdjVjNHVBcHcKRVFYbzFNeUFBTnd6RC85TmxHQTlvNWJnOVJwQzYxU0N5VUI2c3dJREFRQUJvRUV3UHdZSktvWklodmNOQVFrTwpNVEl3TURBaEJnTlZIUkVFR2pBWWdoWnBjM04xWlhJdGNuTmhMV1Y0WVcxd2JHVXVZMjl0TUFzR0ExVWREd1FFCkF3SUZvREFOQmdrcWhraUc5dzBCQVFzRkFBT0NBUUVBeXdza2FIZExRWWI3cFdiMXYvNDREd2JXMFp4Q2cyb1kKbmpSSk5oZ3VEVW43V3F3TlZDOXVnTWlWa0xvTmY3RDVmZ3dONFVNcWliR0NuYTY2WDlHeFJUS1pCNE5YTFE3Sgo4ZnV1cm1IaXF1Vm4vU0ZIbVRqL3NnMDFBVGJRWU9pWmxJUnhaS0oycHdzTjZwT3FNclU3NUplOFhZSEEyaTYvCkw0UGRSZHNVSkU4Z2doL0swb25mcEx6YUhBVmNPL25uS3M0NUFSVkl6MW84UVR3N0EzbnYrZ2RCZDc2YitzQWgKS1JZZUx1a0toMWYxSVBTblZqZmV5dGwwdFVJWktlQkcyWkN0emNuaVdlNXlhZ2xCYlB6aDJ2d3NVcDB1V1dobwo2ODg5d2VGUkE1R0Z6V3ZFWW1aK1VtUE5XZWNQRzNlMlI2blZoOVFTV2VseTNtbDdSZ0xWYXc9PQotLS0tLUVORCBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0K
UID: f5c14e33-830f-420e-86c1-4d6bce4197ef
Username: system:serviceaccount:cert-manager:cert-manager
Status:
Ca: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURZekNDQWt1Z0F3SUJBZ0lSQUxKdllWT3V4UFFRTXp3Q3Y5ejNRT2N3RFFZSktvWklodmNOQVFFTEJRQXcKU3pFWE1CVUdBMVVFQ2d3T1FWZFRUMjVCYVhKVWQybDBZMmd4RnpBVkJnTlZCQXNNRGtGWFUwOXVRV2x5VkhkcApkR05vTVJjd0ZRWURWUVFEREE1QlYxTlBia0ZwY2xSM2FYUmphREFlRncweU16QTVNVEl4TnpNMk1qaGFGdzB6Ck16QTVNVEl4T0RNMk1qUmFNRXN4RnpBVkJnTlZCQW9NRGtGWFUwOXVRV2x5VkhkcGRHTm9NUmN3RlFZRFZRUUwKREE1QlYxTlBia0ZwY2xSM2FYUmphREVYTUJVR0ExVUVBd3dPUVZkVFQyNUJhWEpVZDJsMFkyZ3dnZ0VpTUEwRwpDU3FHU0liM0RRRUJBUVVBQTRJQkR3QXdnZ0VLQW9JQkFRQ3g0WmFHSUZscjR3ck5SUzhIRDEwWWRhblM3TDhWCjBEb3dieDQyQVpPUkZmd2hpRGZybml2b3ErZDJaWkZrdGxGSTVqdzhRN2I4dkppWnZQdklYNzRyQjJmanZjUjEKZ2NtL1NFSGMwUWRmWWxnT3FycE0yQjI2ZEk0YXpkRTBLODBXbUxXRTA3MDkrTks5UWE5Tkg1bk82ZU1nT2haRQphN1FldHBJUmUxcmlzc0ZqQktOY2prSkwzOUl5Z1dqVnRjeXV2Y2dZVlNnaWJOMjRrQkl0WXFkWTlCSXRwWUNDCmxSWjlhWWpkOEp4SXFQMzFIUWJmSVNhREdoaUNuM2pyWnVwdGp4M29oeUlDV3BhWmVDUnRqb2dGMkZzRkI2dFIKYWNTZTM4YVlBZDg3S1ZRNUsxcUROTU01NitSUlRZeFJURmIyaTVqWUMzSXkwL1Q5aEl6ZFlMbnhBZ01CQUFHagpRakJBTUE4R0ExVWRFd0VCL3dRRk1BTUJBZjh3SFFZRFZSME9CQllFRkV6WXdZSDI2UjR5UVBWL2QrcDcvYWdCCm1ZWnpNQTRHQTFVZER3RUIvd1FFQXdJQmhqQU5CZ2txaGtpRzl3MEJBUXNGQUFPQ0FRRUFiK2dSSnNaaW5GY1oKaWVQLzVWSXpBYzhVb0xmVmZrMzcyQkI4ZnZpRTluMDlxbEk2MU44ckhiMUJBeUQzVjNYTE4xSGRlVW90clFhZgpTcjNaVnJkZEFPbDlPaHJFQTBEWjlrYUZNL2p2ZXlIelJQTk1DYSttNnZvaGQycFk1RDUwOWFLT09lc3FmeERrClZJR1ZycXV3OWZhOXJTSWhMcDNaRkJaRTBXbDFPV3F3QzNjMGsxTDRtdHpDeGJ1eFVJQUJZZ0RWRU41M0RlNTUKdXpqUmlJWkE4TkFhb0RPajdMdDdKQmVFSCs4N0JZRmlCWjNMTWN3M1VOaG9UclpJUGs3NnlZVE5YeHRkdlgyOAoxTkZKYVdmSWRpRVJidEJlODYvbDhxVld0SU56RVYxdEhHZjVOSXY1cUJSUFlyQlptd09IdVNXUWFsZitDRUhLCnRORUVQSzBQc2c9PQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
Certificate: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURZRENDQWtpZ0F3SUJBZ0lRRW52TmJrZ2dVV1FRU1Uxb2djKzZ2VEFOQmdrcWhraUc5dzBCQVFzRkFEQkwKTVJjd0ZRWURWUVFLREE1QlYxTlBia0ZwY2xSM2FYUmphREVYTUJVR0ExVUVDd3dPUVZkVFQyNUJhWEpVZDJsMApZMmd4RnpBVkJnTlZCQU1NRGtGWFUwOXVRV2x5VkhkcGRHTm9NQjRYRFRJek1UQXlOakl3TkRZek4xb1hEVEl6Ck1URXlOVEl4TkRZek4xb3dEakVNTUFvR0ExVUVDaE1EWVhkek1JSUJJakFOQmdrcWhraUc5dzBCQVFFRkFBT0MKQVE4QU1JSUJDZ0tDQVFFQTFlT29VWGxvWXVZNElxSitudlF0YkVKQ3BXZFpNRjB2N2k4dVAzd1RvQ0Zabzg0SApmaFkzOGFxb2h5ZWNjTmVGcXY0aXhFdzZvdkh2NmVCWE0zbTdkQk90bGo3bW5VeEZOTko3RHVtOTdEdG1UVEZFCkhmU05PREtndC9mbFJLWEtRQVFDak5YcUk0bWdqRDhzaFZVRFhzYUo1REFYajlMNlN6eUFnczF6ZXVCdlhibmQKZjFta0VxYTJuL2VQNklOakUybGNVeXZnb3hqeStDMm9nK2plN2NBU2RDNEo2K3QxclBwN09RTkVDWkRvY2gyMgpQa2dwV2xwTTl2V09Lb2Y1VXphQTBEblRLN25wRXQ0eVB6bXFOUFpsLytFTDhOOTRWQmRBK29nbXY1YzR1QXB3CkVRWG8xTXlBQU53ekQvOU5sR0E5bzViZzlScEM2MVNDeVVCNnN3SURBUUFCbzMwd2V6QWhCZ05WSFJFRUdqQVkKZ2hacGMzTjFaWEl0Y25OaExXVjRZVzF3YkdVdVkyOXRNQWtHQTFVZEV3UUNNQUF3SHdZRFZSMGpCQmd3Rm9BVQpUTmpCZ2ZicEhqSkE5WDkzNm52OXFBR1pobk13SFFZRFZSME9CQllFRkJHcHNSTjF4dEdiZFlYTnYvWm15RGNmCnBhTW5NQXNHQTFVZER3UUVBd0lGb0RBTkJna3Foa2lHOXcwQkFRc0ZBQU9DQVFFQUZPWGFORGhyNHZSZHQ4NnQKdlpCaGgvbXpXOVZPekRtTWpPYmdIenlOdGl6eTZIalhqVWprZVZWTWFHeEQxaHZzUmwxbTFocTZXaTI2RHpuSgpLbm9xUnJWUnFsVExMSE5ocWRxWXc4U3lRVExmL1VZVmFjNklqTWIvVVJ4eDYwVjBGbEJmUGpaTVFOOVc3Z3lpCklDeG5aaytiMzNNN1AxWitTWHNvU2hwek5TKzVSdGxubXRYYVZUdjkrM0plYkNCRlVVSzRoMUZnalFpOE5xRkMKVGJ6bkhoM1BDUGR6N3BVZWp6dHo2MnBOc1V3Q1hXOE5IQlRLZW01U1MvWEpLbDluU2x5Wm5HSWdrc2ZpSGxiWgptL2NxUnR2Wlk5amRKeHNYcVhKRHFzUWp5bGtqTlN6TTJSMTF2T3EvWFdwSU1UZFptc2szK2VFZVBJNnlIT3ZiClo1aTF6UT09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
Conditions:
Last Transition Time: 2023-10-26T21:46:37Z
Message: Certificate request has been approved by cert-manager.io
Reason: cert-manager.io
Status: True
Type: Approved
Last Transition Time: 2023-10-26T21:46:40Z
Message: certificate issued
Reason: Issued
Status: True
Type: Ready
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal cert-manager.io 21m cert-manager-certificaterequests-approver Certificate request has been approved by cert-manager.io
Normal Issued 21m awspcaissuer-controller certificate issued
[ec2-user@ip-172-31-45-200 ~]$ kubectl describe Certificate
Name: pca-issuer-rsa-cert-retry-blah-1
Namespace: default
Labels: <none>
Annotations: <none>
API Version: cert-manager.io/v1
Kind: Certificate
Metadata:
Creation Timestamp: 2023-10-26T20:46:37Z
Generation: 1
Resource Version: 1385854
UID: 65949e92-9dd4-47c5-885d-d654d4ffb963
Spec:
Dns Names:
issuer-rsa-example.com
Issuer Ref:
Group: awspca.cert-manager.io
Kind: AWSPCAIssuer
Name: pca-issuer-rsa-2
Private Key:
Algorithm: RSA
Size: 2048
Secret Name: pca-issuer-rsa-cert-tls-retry-blah-1
Subject:
Organizations:
aws
Status:
Conditions:
Last Transition Time: 2023-10-26T23:46:40Z
Message: Certificate is up to date and has not expired
Observed Generation: 1
Reason: Ready
Status: True
Type: Ready
Not After: 2023-11-25T23:46:37Z
Not Before: 2023-10-26T22:46:37Z
Renewal Time: 2023-11-15T23:26:37Z
Revision: 1
Events: <none>
[ec2-user@ip-172-31-45-200 ~]$ kubectl describe CertificateRequest
Name: pca-issuer-rsa-cert-retry-blah-1-1
Namespace: default
Labels: <none>
Annotations: cert-manager.io/certificate-name: pca-issuer-rsa-cert-retry-blah-1
cert-manager.io/certificate-revision: 1
cert-manager.io/private-key-secret-name: pca-issuer-rsa-cert-retry-blah-1-p8ssp
API Version: cert-manager.io/v1
Kind: CertificateRequest
Metadata:
Creation Timestamp: 2023-10-26T23:46:37Z
Generation: 1
Owner References:
API Version: cert-manager.io/v1
Block Owner Deletion: true
Controller: true
Kind: Certificate
Name: pca-issuer-rsa-cert-retry-blah-1
UID: 65949e92-9dd4-47c5-885d-d654d4ffb963
Resource Version: 1385848
UID: a72811cd-15e9-4e1f-9302-eeadee8d4bb8
Spec:
Extra:
authentication.kubernetes.io/pod-name:
cert-manager-75d57c8d4b-j5f6s
authentication.kubernetes.io/pod-uid:
fbc686ee-060a-4215-acbc-39fab0993079
Groups:
system:serviceaccounts
system:serviceaccounts:cert-manager
system:authenticated
Issuer Ref:
Group: awspca.cert-manager.io
Kind: AWSPCAIssuer
Name: pca-issuer-rsa-2
Request: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0KTUlJQ2xEQ0NBWHdDQVFBd0RqRU1NQW9HQTFVRUNoTURZWGR6TUlJQklqQU5CZ2txaGtpRzl3MEJBUUVGQUFPQwpBUThBTUlJQkNnS0NBUUVBM29MaG1BR0hxWlIrUUdZUnNIekpqRTFDRWppR2pqQ3FmcEk0aU9lSTA4T1JDUWJDCklqaCtnM0RtWm5KbUR5MVg5RENzancxWGExa2tVZHVrRzRFYUgzSTdQMFdXOVFMaTFZWHVubFV4cklFTkVaWVUKdGtTT1YweUtvWDMxYW4wcWkyQXRjZUVydFJ2bUF5b0loK2FwKzBRYTRvczlWeitrcGk4Y1ZPSEVMZW5UYXRHaApJUEdwdWpvVFVlZlFKQjRqbG1mUFByekxlNVVDNDlVcUc0dWxERjlEeWp3dUlqdmM0N1U2QzJ5TVJGNjI1eU5EClRsWTl2QlhGV2Z3MkJZZFBhUUdhajluVDBndU5pdmczOEp1WFhQL0x4NGQzeGRDRUo4SVBkOENlcWdkYzNZR2sKRmVYUEVSUmQwSzhGWGVHNnlJd3IwcHdpcWxyL0p4RTlYWVV1MHdJREFRQUJvRUV3UHdZSktvWklodmNOQVFrTwpNVEl3TURBaEJnTlZIUkVFR2pBWWdoWnBjM04xWlhJdGNuTmhMV1Y0WVcxd2JHVXVZMjl0TUFzR0ExVWREd1FFCkF3SUZvREFOQmdrcWhraUc5dzBCQVFzRkFBT0NBUUVBblRGZkNkTzEyQmFGaUkycG84OVhtV0drN3RFSmk2VjIKUVQrQ1k2emVnODVyS2VaZmVDM3BXOXNmNnBRQ3ZOUlZsK096S0xXbU16OW9VbStWYjk2NTVFQlp3UFF4WnlaTQp1Zkd6WnBvQWpYZC95K2tDVVZ1ODNLdm5QVVFYcWUvdW4vRlM3Wm42S2hCS0VwWjl5aU1KRjFIbUdCSHFIV1luCk1KNE9VbGRYelhOa0dLQjZseCsrbzdrN1JXTStsZVJOQzRwbWkycElJck1NMmpqdEdmY1R3ek1PblBKNUxLME0KQjlLSFMwc2hKdlBoTzlHcC8xUldodzFEaTBzT3pOS1Rqc3N4TXQ2Y0o5YUNoVkh5NW1BNnpNaW1JTTlkcnpJdQpIcS9TYnZnK085cm5WUHIrS0k1bG5ZVVBEYndtRVFxN2RvdytTbk9KaUNsNUlvbjdEYzk5dkE9PQotLS0tLUVORCBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0K
UID: f5c14e33-830f-420e-86c1-4d6bce4197ef
Username: system:serviceaccount:cert-manager:cert-manager
Status:
Ca: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURZekNDQWt1Z0F3SUJBZ0lSQUxKdllWT3V4UFFRTXp3Q3Y5ejNRT2N3RFFZSktvWklodmNOQVFFTEJRQXcKU3pFWE1CVUdBMVVFQ2d3T1FWZFRUMjVCYVhKVWQybDBZMmd4RnpBVkJnTlZCQXNNRGtGWFUwOXVRV2x5VkhkcApkR05vTVJjd0ZRWURWUVFEREE1QlYxTlBia0ZwY2xSM2FYUmphREFlRncweU16QTVNVEl4TnpNMk1qaGFGdzB6Ck16QTVNVEl4T0RNMk1qUmFNRXN4RnpBVkJnTlZCQW9NRGtGWFUwOXVRV2x5VkhkcGRHTm9NUmN3RlFZRFZRUUwKREE1QlYxTlBia0ZwY2xSM2FYUmphREVYTUJVR0ExVUVBd3dPUVZkVFQyNUJhWEpVZDJsMFkyZ3dnZ0VpTUEwRwpDU3FHU0liM0RRRUJBUVVBQTRJQkR3QXdnZ0VLQW9JQkFRQ3g0WmFHSUZscjR3ck5SUzhIRDEwWWRhblM3TDhWCjBEb3dieDQyQVpPUkZmd2hpRGZybml2b3ErZDJaWkZrdGxGSTVqdzhRN2I4dkppWnZQdklYNzRyQjJmanZjUjEKZ2NtL1NFSGMwUWRmWWxnT3FycE0yQjI2ZEk0YXpkRTBLODBXbUxXRTA3MDkrTks5UWE5Tkg1bk82ZU1nT2haRQphN1FldHBJUmUxcmlzc0ZqQktOY2prSkwzOUl5Z1dqVnRjeXV2Y2dZVlNnaWJOMjRrQkl0WXFkWTlCSXRwWUNDCmxSWjlhWWpkOEp4SXFQMzFIUWJmSVNhREdoaUNuM2pyWnVwdGp4M29oeUlDV3BhWmVDUnRqb2dGMkZzRkI2dFIKYWNTZTM4YVlBZDg3S1ZRNUsxcUROTU01NitSUlRZeFJURmIyaTVqWUMzSXkwL1Q5aEl6ZFlMbnhBZ01CQUFHagpRakJBTUE4R0ExVWRFd0VCL3dRRk1BTUJBZjh3SFFZRFZSME9CQllFRkV6WXdZSDI2UjR5UVBWL2QrcDcvYWdCCm1ZWnpNQTRHQTFVZER3RUIvd1FFQXdJQmhqQU5CZ2txaGtpRzl3MEJBUXNGQUFPQ0FRRUFiK2dSSnNaaW5GY1oKaWVQLzVWSXpBYzhVb0xmVmZrMzcyQkI4ZnZpRTluMDlxbEk2MU44ckhiMUJBeUQzVjNYTE4xSGRlVW90clFhZgpTcjNaVnJkZEFPbDlPaHJFQTBEWjlrYUZNL2p2ZXlIelJQTk1DYSttNnZvaGQycFk1RDUwOWFLT09lc3FmeERrClZJR1ZycXV3OWZhOXJTSWhMcDNaRkJaRTBXbDFPV3F3QzNjMGsxTDRtdHpDeGJ1eFVJQUJZZ0RWRU41M0RlNTUKdXpqUmlJWkE4TkFhb0RPajdMdDdKQmVFSCs4N0JZRmlCWjNMTWN3M1VOaG9UclpJUGs3NnlZVE5YeHRkdlgyOAoxTkZKYVdmSWRpRVJidEJlODYvbDhxVld0SU56RVYxdEhHZjVOSXY1cUJSUFlyQlptd09IdVNXUWFsZitDRUhLCnRORUVQSzBQc2c9PQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
Certificate: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURZRENDQWtpZ0F3SUJBZ0lRYlJmUUJFbGhwcTRZUzMwK0RJb2ZLakFOQmdrcWhraUc5dzBCQVFzRkFEQkwKTVJjd0ZRWURWUVFLREE1QlYxTlBia0ZwY2xSM2FYUmphREVYTUJVR0ExVUVDd3dPUVZkVFQyNUJhWEpVZDJsMApZMmd4RnpBVkJnTlZCQU1NRGtGWFUwOXVRV2x5VkhkcGRHTm9NQjRYRFRJek1UQXlOakl5TkRZek4xb1hEVEl6Ck1URXlOVEl6TkRZek4xb3dEakVNTUFvR0ExVUVDaE1EWVhkek1JSUJJakFOQmdrcWhraUc5dzBCQVFFRkFBT0MKQVE4QU1JSUJDZ0tDQVFFQTNvTGhtQUdIcVpSK1FHWVJzSHpKakUxQ0VqaUdqakNxZnBJNGlPZUkwOE9SQ1FiQwpJamgrZzNEbVpuSm1EeTFYOURDc2p3MVhhMWtrVWR1a0c0RWFIM0k3UDBXVzlRTGkxWVh1bmxVeHJJRU5FWllVCnRrU09WMHlLb1gzMWFuMHFpMkF0Y2VFcnRSdm1BeW9JaCthcCswUWE0b3M5Vnora3BpOGNWT0hFTGVuVGF0R2gKSVBHcHVqb1RVZWZRSkI0amxtZlBQcnpMZTVVQzQ5VXFHNHVsREY5RHlqd3VJanZjNDdVNkMyeU1SRjYyNXlORApUbFk5dkJYRldmdzJCWWRQYVFHYWo5blQwZ3VOaXZnMzhKdVhYUC9MeDRkM3hkQ0VKOElQZDhDZXFnZGMzWUdrCkZlWFBFUlJkMEs4RlhlRzZ5SXdyMHB3aXFsci9KeEU5WFlVdTB3SURBUUFCbzMwd2V6QWhCZ05WSFJFRUdqQVkKZ2hacGMzTjFaWEl0Y25OaExXVjRZVzF3YkdVdVkyOXRNQWtHQTFVZEV3UUNNQUF3SHdZRFZSMGpCQmd3Rm9BVQpUTmpCZ2ZicEhqSkE5WDkzNm52OXFBR1pobk13SFFZRFZSME9CQllFRlBuS1NUMkFwSnVBb1RZb3lTVzk2MjZkCmpBL1NNQXNHQTFVZER3UUVBd0lGb0RBTkJna3Foa2lHOXcwQkFRc0ZBQU9DQVFFQVNVZEtLcUhvRHM2RnAvR2oKeVcvTGhxcFIzeVh5L1hBc0FKODRGQjZHVkhYcGpKMlBXblhRTHZwVVZrdHp3Yi8wTWtvSGloTWdvQi9vQksxNApHNm5mSkNtaWdoa3lEaTVaZGorN0UzdlVSVW1oYm12bDVXbWN0QitIa1Roa0M2L1NwTldRc0N1aDdGVDVIRXllClFvTlZ2amtyMXk5U0JqbzJTSi9JMXp6aGJvc2lldWhEQnZ3UDUvT3hORWtncFJzTTZ4SVBROTd2ZDZ0RnA5RFAKNUVEZjE5WEJRSjlQSm5Kdk83b2xoYXlGY0t4cEExampoSEFDaE00K2MwUXZ2SXlUVUlVZFJpMmtRZlpBaWZoMwpBWXhPWjNDMEFDMVQ5Qy9FS2RvZEFBTlF6ajU4b1FaUHl1K0ZjRmxKdzIyYVk5bVVZdFA0MWhUbXJ4VjdpTEM5CkE4eU5jdz09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
Conditions:
Last Transition Time: 2023-10-26T23:46:37Z
Message: Certificate request has been approved by cert-manager.io
Reason: cert-manager.io
Status: True
Type: Approved
Last Transition Time: 2023-10-26T23:46:40Z
Message: certificate issued
Reason: Issued
Status: True
Type: Ready
Events: <none>
To further summarize: When a Certificate is requested before an Issuer is created, the Certificate will fail to issue. When the issuer is created, the Certificate will issue on the next retry (1 hour), but the Certificate object isn't being updated correctly to reflect that even though the CertificateRequest object shows a success. On the next retry (3 hours), the Certificate will also issue again and the Certificate object will also be updated to show that the certificate was issued successfully, so at this point, 2 certificates were issued, and everything is stable.
This is clearly the wrong behavior, and the correct behavior is that on the first retry, when the certificate was issued, the Certificate object should also be updated to reflect the successful issuance. That seems to be the root of the problem that needs to be addressed. I am not yet sure on where to address that issue - it seems that the cert-manager ReadinessController is responsible for taking a CertificateRequest that is successful and updating the Certificate object, so I am looking there: https://github.com/cert-manager/cert-manager/blob/master/pkg/controller/certificates/readiness/readiness_controller.go
@divyansh-gupta thank you for the update. There must be definitely something wrong somewhere. If you look at my event log, you can clearly see that it retried multiple times and never succeed, without deleting the certificaterequest.
Despite I think one hour delay because of a "race condition" during setup is quite long. (in hour case it's not me issue of a deleter issuer but, just that the controller is somehow in a not full ready state (it fails that the controller can't receive the provisioner, even though it did exist the whole time.
There seem to be two distinct problems:
Upon a successful retry after an initial failure, it seems that Certificate objects are not always being marked as successfully issued the first time. This is out of the scope of this PR, and may be an issue with cert-manager's ReadinessController. We will leave this issue out of the scope of this discussion for now and will be looked into seperately.
Our plugin is not retrying quickly enough when common "temporary" errors occur. This PR addresses that.
Thanks for finding these issues and more importanly creating a PR to help!
We will be accepting this PR, however there are two things that I'd like to consider as changes in this PR:
We should default MaxRetryDuration to 180. That should allow for 2 retries, considering each retry is set to occur between 1-2 mins.
We should not retry errors returned from the provisioner.sign
method. Those should be rare and retry slowly/permanently fail, but I believe this PR marks those errors as temporary.
Let me know what you think of the suggestions above! Thanks for your help again!
Detected different CRDs in the config/crd/bases
directory and charts/aws-pca-issuer/crds
directory.
Since both CRDs were modified in this commit(s), they were unable to be automatically synced. Please update the pull request with identical CRDs for this workflow to pass.
@divyansh-gupta Thank you for your feedback, and I agree with your points. I just updated a PR which reflects your suggestions.
Please reach out to me if you think additional changes are needed.
Hi @woehrl01 - thank you! Noting the bot here:
Detected different CRDs in the config/crd/bases directory and charts/aws-pca-issuer/crds directory.
Since both CRDs were modified in this commit(s), they were unable to be automatically synced. Please update the pull request with identical CRDs for this workflow to pass.
Could you please make sure this is synced up? Thank you!
@divyansh-gupta Just pushed so that there are no additional diffs of the CRDs
failure-provisioner-not-found-temporary
This test is failing when run in our automated tests. I am assuming this is because the helm chart file change is not being pulled into our testing.
@divyansh-gupta thank you, looks like we need to "stabilize" the jitter for the tests. Just did that.
Thank you! It looks like the automated testing is continuing to fail: https://github.com/cert-manager/aws-privateca-issuer/actions/runs/6893208888/job/18752298060#step:15:802.
@divyansh-gupta thank you for your patience. I would like to confirm that this change fixed out issue in the past. I'll have a look at the failing tests and update the PR.
PR needs rebase.
Hi! Is this still being worked on?
@cunningr Not sure what you mean with being worked on. It works like this smooth in production.
@cunningr Not sure what you mean with being worked on. It works like this smooth in production.
Apologies - I was looking for a main
branch release where I could pick up this fix.
@bmsiegel tagging you, because you are contributing lately. Is there anything missing to get this merged?
It looks like we need a successful test run, will start that now. It also looks like you'll need a rebase.
https://github.com/cert-manager/aws-privateca-issuer/actions/runs/9666406821/job/26665886635
Testing failed here. I believe this is due to a missing rebase tho.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign sgtcodfish for approval. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
@bmsiegel thanks. Just pushed a rebased version. Can you please rerun the tests?
You have a typo in your commit message. I can run the testing but will need to fix that before we merge.
Thanks, I will do a squash if we are ready to merge
Great sounds good. Running tests now.
@bmsiegel Do you have any hint, what could be the root cause for the failing tests. I'm looking since a while at my changes and couldn't find any relation why there would be no condition at all set?
2024-06-26T18:04:11.124744179Z stderr F invalid value "180" for flag -max-retry-duration: parse error
2024-06-26T18:04:11.12488991Z stderr F Usage of /manager:
2024-06-26T18:04:11.124985356Z stderr F -disable-approved-check
2024-06-26T18:04:11.12499068Z stderr F Disables waiting for CertificateRequests to have an approved condition before signing.
2024-06-26T18:04:11.125041441Z stderr F -health-probe-bind-address string
2024-06-26T18:04:11.125045097Z stderr F The address the probe endpoint binds to. (default ":8081")
2024-06-26T18:04:11.125096734Z stderr F -kubeconfig string
2024-06-26T18:04:11.12510018Z stderr F Paths to a kubeconfig. Only required if out-of-cluster.
2024-06-26T18:04:11.125144205Z stderr F -leader-elect
2024-06-26T18:04:11.125147624Z stderr F Enable leader election for controller manager. Enabling this will ensure there is only one active controller manager.
2024-06-26T18:04:11.125258051Z stderr F -max-retry-duration duration
2024-06-26T18:04:11.1252624Z stderr F Maximum duration to retry failed CertificateRequests. Set to 0 to disable. (default 180ns)
2024-06-26T18:04:11.125310172Z stderr F -metrics-bind-address string
2024-06-26T18:04:11.125313392Z stderr F The address the metric endpoint binds to. (default ":8080")
2024-06-26T18:04:11.125362966Z stderr F -zap-devel
2024-06-26T18:04:11.125366635Z stderr F Development Mode defaults(encoder=consoleEncoder,logLevel=Debug,stackTraceLevel=Warn). Production Mode defaults(encoder=jsonEncoder,logLevel=Info,stackTraceLevel=Error)
2024-06-26T18:04:11.125413951Z stderr F -zap-encoder value
2024-06-26T18:04:11.125417062Z stderr F Zap log encoding (one of 'json' or 'console')
2024-06-26T18:04:11.125475648Z stderr F -zap-log-level value
2024-06-26T18:04:11.125479139Z stderr F Zap Level to configure the verbosity of logging. Can be one of 'debug', 'info', 'error', or any integer value > 0 which corresponds to custom debug levels of increasing verbosity
2024-06-26T18:04:11.125521717Z stderr F -zap-stacktrace-level value
2024-06-26T18:04:11.125524926Z stderr F Zap Level at and above which stacktraces are captured (one of 'info', 'error', 'panic').
2024-06-26T18:04:11.1255661Z stderr F -zap-time-encoding value
2024-06-26T18:04:11.125569402Z stderr F Zap time encoding (one of 'epoch', 'millis', 'nano', 'iso8601', 'rfc3339' or 'rfc3339nano'). Defaults to 'epoch'.
We currently copy our kind logs to s3, pulled this from there
@bmsiegel thank you! Just pushed the right fix
> kubectl logs aws-pca-issuer-1720444904-aws-privateca-issuer-6b8d97946-87n4z
flag provided but not defined: -max-retry-duration
Usage of /manager:
-disable-approved-check
Disables waiting for CertificateRequests to have an approved condition before signing.
-health-probe-bind-address string
The address the probe endpoint binds to. (default ":8081")
-kubeconfig string
Paths to a kubeconfig. Only required if out-of-cluster.
-leader-elect
Enable leader election for controller manager. Enabling this will ensure there is only one active controller manager.
-metrics-bind-address string
The address the metric endpoint binds to. (default ":8080")
-zap-devel
Development Mode defaults(encoder=consoleEncoder,logLevel=Debug,stackTraceLevel=Warn). Production Mode defaults(encoder=jsonEncoder,logLevel=Info,stackTraceLevel=Error)
-zap-encoder value
Zap log encoding (one of 'json' or 'console')
-zap-log-level value
Zap Level to configure the verbosity of logging. Can be one of 'debug', 'info', 'error', or any integer value > 0 which corresponds to custom debug levels of increasing verbosity
-zap-stacktrace-level value
Zap Level at and above which stacktraces are captured (one of 'info', 'error', 'panic').
-zap-time-encoding value
Zap time encoding (one of 'epoch', 'millis', 'nano', 'iso8601', 'rfc3339' or 'rfc3339nano'). Defaults to 'epoch'.
Since our testing uses our prod ecr version, we end up facing this error from the helm test.
@bmsiegel Thank you for sharing the output. I don't exactly understand what you mean. Are the e2e tests using the prod ECR, which means that the helmchart should be modified to be backwards compatible? (e.g. removing the flag if it isn't set?)
Yes, if there's a way we can make it backwards compatible that is preferable
@bmsiegel I removed the default value from the helm chart, now it should be compatible with the old image as well.
@bmsiegel looks like the tests are successful now. If you there are no additional changes required from your side I will do the squashing of the changes. Wdyt?
Only thing I'd ask is that we remove everything except the necessary changes for the PR. You can cut new PRs for the other things you did here or we can take care of it. Would make the history a little easier to follow.
If a previous certificate request failed, it will be currently never retried again.
This adds a configuration
max-retry-duration
to allow a transition time to recover from transient errors.The PR includes: