Open pchang388 opened 2 weeks ago
I took a quick look at the changes made in v1.15.0
as mentioned in the release notes (#6878) but at a surface/just diffs level, I didn't see anything that would cause this but I went ahead and downgraded back to v1.14.4
to see if the issue pops there. It does not, cert-manager was able to renew the certificate that was pending from v1.15.0
manual renewal.
Downgrade steps followed (couldn't find exactly if this was supported/recommended):
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.4/cert-manager.crds.yaml
helm rollback cert-manager -n cert-manager
From what I can tell so far, it appears to be an issue introduced in v1.15.0
, I pretty much followed the guide (excluding the cross account access or IRSA stuff) to set up the Route53 Provider and ClusterIssuer and didn't have problems before.
would it be a good idea to override AWS_REGION
to be aws-global
to serve as a temporary workaround for v1.15.0? I think this could be OK since the route53 challenge is the only AWS API cert-manager uses?
Just add more info which may help debug.
I met same error on v1.15.1
(both Helm chart and cert-manager.crds.yaml are same version):
error instantiating route53 challenge solver: unable to assume role: operation error STS: AssumeRole, failed to resolve service endpoint, endpoint rule error, Invalid Configuration: Missing Region"
Downgraded to v1.14.7
works for me as a temporary solution.
would it be a good idea to override
AWS_REGION
to beaws-global
to serve as a temporary workaround for v1.15.0? I think this could be OK since the route53 challenge is the only AWS API cert-manager uses?
@cwyl02 Setting this env in helm worked for me as a "workaround" but it looks like cleanup is no longer working - at least for me
extraEnv:
- name: AWS_REGION
value: 'aws-global'
error:
E0628 11:42:46.460550 1 sync.go:283] "error cleaning up challenge" err="failed to change Route 53 record set: operation error Route 53: ChangeResourceRecordSets, https response error StatusCode: 400, RequestID: <REDACTED>, InvalidChangeBatch: [Tried to delete resource record set [name='_acme-challenge.xxxxxxxxxx.', type='TXT', set-identifier='\"yyyyyyyyyyyy\"'] but it was not found]" logger="cert-manager.controller.finalizer" resource_name="aaaaaaaaaaaa" resource_namespace="bbbbbbb" resource_kind="Challenge" resource_version="v1" dnsName="foo.bar.com" type="DNS-01"
I guess it is a good idea to wait for https://github.com/cert-manager/cert-manager/pull/7108 to be merged (the PR is waiting for @pchang388 to confirm this solves the issue)
I guess it is a good idea to wait for #7108 to be merged (the PR is waiting for @pchang388 to confirm this solves the issue)
@hongbo-miao , @cwyl02 , @k11h-de We'd be grateful if any of you could test that PR.
Problem: After upgrade to
v1.15.0
fromv1.14.4
and upgrading CRDs beforehand, I am no longer able to manually trigger a renewal via cmctl. When attempting to do so, these messages show up in the cert-manager pod logsThis worked previously in the past and my ClusterIssuer configuration hasn't been an issue. The region field is specified and It looks like:
cmctl
was used as mentioned and shows version:Expected behaviour: Assume role works with AWS Route53 provider as it has in previous versions of cert-manager.
Steps to reproduce the bug:
v1.15.0
fromv1.14.4
v1.15.0
fromv1.14.4
cmctl renew letsencrypt-prod-certificate -n <namespace>
Environment details::
v1.29.4+k3s1
v1.15.0
/kind bug