COPRS / rs-issues

This repository contains all the issues of the COPRS project (Scrum tickets, ivv bugs, epics ...)
2 stars 2 forks source link

[BUG] Expiration of certificates #771

Closed Woljtek closed 1 year ago

Woljtek commented 1 year ago

Environment:

Current Behavior: While an user try to connect to a RS url, the web-browser return the following error: NET::ERR_CERT_DATE_INVALID

Expected Behavior: Certificates shall be valid to allow user to connect to any RS url.

Steps To Reproduce:

  1. Connect to any RS url, for example: KIBANA
  2. The following error appears: NET::ERR_CERT_DATE_INVALID

Test execution artefacts (i.e. logs, screenshots…) N/A

Whenever possible, first analysis of the root cause The certificates are expired (end of validity date: 27/12/22). The automatic update of certificates does not work.

Bug Generic Definition of Ready (DoR)

Bug Generic Definition of Done (DoD)

Woljtek commented 1 year ago

To work around the issue, use privacy mode of web-browser and add an exception.

pcuq-ads commented 1 year ago

Hello and happy new year the team ! We were expecting an automatic renew of the certificate.

@nleconte-csgroup , we are reading the "how-to" documentation to fix the problem. If you can, it will be nice to provide us any help (any command) to restore the situation.

Thank you

suberti-ads commented 1 year ago

We see certificate update in k8s events :

 kubectl get events -A | grep certifica 
iam               18m         Normal    IssuedLeafCertificate   serviceaccount/default                                issued certificate for default.iam.serviceaccount.identity.networking.cluster.local until 2023-01-04 14:35:02 +0000 UTC: 508b8afd5f4c472100fdd94ddfc802c3
infra             26m         Normal    IssuedLeafCertificate   serviceaccount/cert-manager                           issued certificate for cert-manager.infra.serviceaccount.identity.networking.cluster.local until 2023-01-04 14:26:30 +0000 UTC: dabc0c633a4b1f21458d8474e4281c7b
logging           45m         Normal    IssuedLeafCertificate   serviceaccount/default                                issued certificate for default.logging.serviceaccount.identity.networking.cluster.local until 2023-01-04 14:07:54 +0000 UTC: d8919da6b36373cb96578d61071f2d5d
processing        22m         Normal    IssuedLeafCertificate   serviceaccount/default                                issued certificate for default.processing.serviceaccount.identity.networking.cluster.local until 2023-01-04 14:31:19 +0000 UTC: 443b093538c9637f7c7f6c85f971bc76
processing        22m         Normal    IssuedLeafCertificate   serviceaccount/default                                issued certificate for default.processing.serviceaccount.identity.networking.cluster.local until 2023-01-04 14:31:21 +0000 UTC: 19e0c1e29b6d3ac498b547512b3b093d
processing        22m         Normal    IssuedLeafCertificate   serviceaccount/default                                issued certificate for default.processing.serviceaccount.identity.networking.cluster.local until 2023-01-04 14:31:26 +0000 UTC: f92e7a7b8eb4a4a04d5cf2df77cd439f
processing        22m         Normal    IssuedLeafCertificate   serviceaccount/default                                issued certificate for default.processing.serviceaccount.identity.networking.cluster.local until 2023-01-04 14:31:27 +0000 UTC: 3cab75dfb51910d4db8051548da0eb51
processing        19m         Normal    IssuedLeafCertificate   serviceaccount/default                                issued certificate for default.processing.serviceaccount.identity.networking.cluster.local until 2023-01-04 14:34:09 +0000 UTC: b143986825a034c89919f5b0195d74ba
processing        19m         Normal    IssuedLeafCertificate   serviceaccount/default                                issued certificate for default.processing.serviceaccount.identity.networking.cluster.local until 2023-01-04 14:34:15 +0000 UTC: b2168234c251b16d00b68464cd2fea73
processing        19m         Normal    IssuedLeafCertificate   serviceaccount/default                                issued certificate for default.processing.serviceaccount.identity.networking.cluster.local until 2023-01-04 14:34:16 +0000 UTC: 15ee03f7fd32f0b956247eeeaa658fd3
processing        19m         Normal    IssuedLeafCertificate   serviceaccount/default                                issued certificate for default.processing.serviceaccount.identity.networking.cluster.local until 2023-01-04 14:34:24 +0000 UTC: 1e44d808243fa37963f191ac60cf58bb
processing        19m         Normal    IssuedLeafCertificate   serviceaccount/default                                issued certificate for default.processing.serviceaccount.identity.networking.cluster.local until 2023-01-04 14:34:24 +0000 UTC: 6449a03ffa854f34135980cb052b6c96
processing        16m         Normal    IssuedLeafCertificate   serviceaccount/default                                issued certificate for default.processing.serviceaccount.identity.networking.cluster.local until 2023-01-04 14:36:33 +0000 UTC: 8c564be575254662e782c129511d8710
security          51m         Normal    IssuedLeafCertificate   serviceaccount/falco-exporter                         issued certificate for falco-exporter.security.serviceaccount.identity.networking.cluster.local until 2023-01-04 14:02:21 +0000 UTC: ff5f69425eb6418e00eb882af7961f01
security          14m         Normal    IssuedLeafCertificate   serviceaccount/falco-exporter                         issued certificate for falco-exporter.security.serviceaccount.identity.networking.cluster.local until 2023-01-04 14:39:03 +0000 UTC: 156a4f414b040ad5893995c6274551b9
security          51m         Normal    IssuedLeafCertificate   serviceaccount/falco                                  issued certificate for falco.security.serviceaccount.identity.networking.cluster.local until 2023-01-04 14:02:20 +0000 UTC: 388db2c06b2fd8b2e88e0b2972a6eb92

Ingress-tls seems to be expired since "2022-12-27T12:24:02Z":

 safescale  gw-cluster-ops  ~  kubectl get certificates.cert-manager.io  -A
NAMESPACE    NAME                       READY   SECRET                             AGE
networking   ingress-tls                False   ingress-tls                        110d
[...]
 safescale  gw-cluster-ops  ~  kubectl get certificates.cert-manager.io -n networking  ingress-tls  -o yaml
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  creationTimestamp: "2022-09-14T14:24:33Z"
  generation: 6
  labels:
    app.kubernetes.io/instance: apisix
  name: ingress-tls
  namespace: networking
  resourceVersion: "23852959"
  uid: 42f908cc-de41-48b9-8de4-7b711e4cbb3f
spec:
  dnsNames:
  - '*.platform.ops-csc.com'
  issuerRef:
    kind: Issuer
    name: ingress-tls
  secretName: XXXX
  usages:
  - digital signature
  - key encipherment
status:
  conditions:
  - lastTransitionTime: "2022-10-07T14:37:13Z"
    message: 'Fields on existing CertificateRequest resource not up to date: [spec.dnsNames]'
    observedGeneration: 6
    reason: RequestChanged
    status: "False"
    type: Ready
  - lastTransitionTime: "2022-10-07T14:37:19Z"
    message: 'Fields on existing CertificateRequest resource not up to date: [spec.dnsNames]'
    observedGeneration: 6
    reason: RequestChanged
    status: "True"
    type: Issuing
  nextPrivateKeySecretName: ingress-tls-xrkv8
  notAfter: "2022-12-27T12:24:02Z"
  notBefore: "2022-09-28T12:24:03Z"
  renewalTime: "2022-11-27T12:24:02Z"
  revision: 3

Hereafter log for cert-manager:

I0103 14:26:38.260691       1 trace.go:205] Trace[1319325238]: "Reflector ListAndWatch" name:external/io_k8s_client_go/tools/cache/reflector.go:167 (03-Jan-2023 14:26:23.061) (total time: 15199ms):
Trace[1319325238]: ---"Objects listed" 15199ms (14:26:38.260)
Trace[1319325238]: [15.199621345s] [15.199621345s] END
E0103 14:26:38.661151       1 controller.go:163] cert-manager/controller/orders "msg"="re-queuing item due to error processing" "error"="ACME client for issuer not initialised/available" "key"="networking/ingress-tls-8hmf7-2013388072" 
E0103 14:26:38.860312       1 controller.go:163] cert-manager/controller/orders "msg"="re-queuing item due to error processing" "error"="ACME client for issuer not initialised/available" "key"="networking/ingress-tls-kppsl-2013388072" 
E0103 14:26:38.860402       1 controller.go:163] cert-manager/controller/orders "msg"="re-queuing item due to error processing" "error"="ACME client for issuer not initialised/available" "key"="networking/ingress-tls-mw42g-4162737047" 
E0103 14:26:38.861241       1 controller.go:163] cert-manager/controller/orders "msg"="re-queuing item due to error processing" "error"="ACME client for issuer not initialised/available" "key"="networking/sentinelprocessors-n74ln-1865814251" 
E0103 14:26:39.059635       1 controller.go:163] cert-manager/controller/orders "msg"="re-queuing item due to error processing" "error"="ACME client for issuer not initialised/available" "key"="networking/ingress-tls-d4lqq-3316629141" 
E0103 14:26:39.259609       1 controller.go:163] cert-manager/controller/orders "msg"="re-queuing item due to error processing" "error"="ACME client for issuer not initialised/available" "key"="networking/sentinelprocessors-s58lv-1865814251" 
I0103 14:26:40.460022       1 setup.go:202] cert-manager/controller/issuers "msg"="skipping re-verifying ACME account as cached registration details look sufficient" "related_resource_kind"="Secret" "related_resource_name"="letsencrypt-http-prod" "related_resource_namespace"="networking" "resource_kind"="Issuer" "resource_name"="ingress-tls" "resource_namespace"="networking" "resource_version"="v1" 
I0103 14:26:43.263676       1 setup.go:202] cert-manager/controller/issuers "msg"="skipping re-verifying ACME account as cached registration details look sufficient" "related_resource_kind"="Secret" "related_resource_name"="letsencrypt-http-prod" "related_resource_namespace"="networking" "resource_kind"="Issuer" "resource_name"="ingress-tls" "resource_namespace"="networking" "resource_version"="v1" 
E0103 14:26:44.261194       1 sync.go:122] cert-manager/controller/orders "msg"="Failed to determine the list of Challenge resources needed for the Order" "error"="no configured challenge solvers can be used for this challenge" "resource_kind"="Order" "resource_name"="ingress-tls-mw42g-4162737047" "resource_namespace"="networking" "resource_version"="v1" 
I0103 14:27:02.160491       1 trace.go:205] Trace[149057581]: "Reflector ListAndWatch" name:external/io_k8s_client_go/tools/cache/reflector.go:167 (03-Jan-2023 14:26:22.660) (total time: 39499ms):
Trace[149057581]: ---"Objects listed" 39399ms (14:27:02.060)
Trace[149057581]: [39.499518724s] [39.499518724s] END
pcuq-ads commented 1 year ago

On certmanager log, we see following first lines:

I0103 14:26:17.459343       1 start.go:75] cert-manager "msg"="starting controller"  "git-commit"="5ecf5b5617a4813ea8115da5dcfe3cd18b8ff047" "version"="v1.6.1"
W0103 14:26:17.461434       1 client_config.go:615] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.

Can it be a problem with the kube config file reference ?

nleconte-csgroup commented 1 year ago

Hey, you could try to update the ingress-tls' certificate value, changing from :

[...]
spec:
  dnsNames:
  - '*.platform.ops-csc.com'
[...]

to :

[...]
spec:
  dnsNames:
  - 'apisix.platform.ops-csc.com'
  - 'iam.platform.ops-csc.com'
  - 'monitoring.platform.ops-csc.com'
  - 'processing.platform.ops-csc.com
  - 'security.platform.ops-csc.com'
  - 'uwc.platform.ops-csc.com'
[...]

Note : If you created other subdomains, add them here as well.

suberti-ads commented 1 year ago

Hereafter updated configuration for tls

  - 'apisix.{{ platform_domain_name }}'
  - 'linkerd.{{ platform_domain_name }}'
  - 'iam.{{ platform_domain_name }}'
  - 'monitoring.{{ platform_domain_name }}'
  - 'processing.{{ platform_domain_name }}'
  - 'security.{{ platform_domain_name }}'

Configuration update seems taken in account:

 safescale  gw-cluster-ops  ~  kubectl get certificates.cert-manager.io -n networking  ingress-tls  -o yaml
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  creationTimestamp: "2022-09-14T14:24:33Z"
  generation: 7
  labels:
    app.kubernetes.io/instance: apisix
  name: ingress-tls
  namespace: networking
  resourceVersion: "124252183"
  uid: 42f908cc-de41-48b9-8de4-7b711e4cbb3f
spec:
  dnsNames:
  - apisix.platform.ops-csc.com
  - linkerd.platform.ops-csc.com
  - iam.platform.ops-csc.com
  - monitoring.platform.ops-csc.com
  - processing.platform.ops-csc.com
  - security.platform.ops-csc.com
  - uwc.platform.ops-csc.com
  issuerRef:
    kind: Issuer
    name: ingress-tls
  secretName: XXXXXX
  usages:
  - digital signature
  - key encipherment
status:
  conditions:
  - lastTransitionTime: "2023-01-03T15:55:18Z"
    message: Certificate is up to date and has not expired
    observedGeneration: 7
    reason: Ready
    status: "True"
    type: Ready
  notAfter: "2023-04-03T14:55:00Z"
  notBefore: "2023-01-03T14:55:01Z"
  renewalTime: "2023-03-04T14:55:00Z"
  revision: 4

Hereafter cert manager log after apisix upgrade:

I0103 15:49:14.805636       1 requestmanager_controller.go:314] cert-manager/controller/certificates-request-manager "msg"="CertificateRequest does not match requirements on certificate.spec, deleting CertificateRequest" "key"="networking/ingress-tls" "related_resource_kind"="CertificateRequest" "related_resource_name"="ingress-tls-mw42g" "related_resource_namespace"="networking" "related_resource_version"="v1" "violations"=["spec.dnsNames"]
E0103 15:49:14.959704       1 controller.go:171] cert-manager/controller/orders "msg"="order in work queue no longer exists" "error"="order.acme.cert-manager.io \"ingress-tls-mw42g-4162737047\" not found"  
E0103 15:49:24.881373       1 controller.go:163] cert-manager/controller/certificates-readiness "msg"="re-queuing item due to error processing" "error"="Internal error occurred: failed calling webhook \"webhook.cert-manager.io\": failed to call webhook: Post \"https://cert-manager-webhook.infra.svc:443/mutate?timeout=10s\": context deadline exceeded" "key"="networking/ingress-tls" 
E0103 15:49:24.965104       1 acme.go:137] cert-manager/controller/certificaterequests-issuer-acme/sign "msg"="Failed create new order resource networking/ingress-tls-mw42g-4162737047" "error"="Internal error occurred: failed calling webhook \"webhook.cert-manager.io\": failed to call webhook: Post \"https://cert-manager-webhook.infra.svc:443/mutate?timeout=10s\": context deadline exceeded" "resource_kind"="CertificateRequest" "resource_name"="ingress-tls-mw42g" "resource_namespace"="networking" "resource_version"="v1" 
E0103 15:49:24.965218       1 sync.go:136] cert-manager/controller/certificaterequests-issuer-acme "msg"="error issuing certificate request" "error"="Internal error occurred: failed calling webhook \"webhook.cert-manager.io\": failed to call webhook: Post \"https://cert-manager-webhook.infra.svc:443/mutate?timeout=10s\": context deadline exceeded" "related_resource_kind"="Issuer" "related_resource_name"="ingress-tls" "related_resource_namespace"="networking" "related_resource_version"="v1" "resource_kind"="CertificateRequest" "resource_name"="ingress-tls-mw42g" "resource_namespace"="networking" "resource_version"="v1" 
E0103 15:49:25.063765       1 controller.go:163] cert-manager/controller/certificaterequests-issuer-acme "msg"="re-queuing item due to error processing" "error"="[Operation cannot be fulfilled on certificaterequests.cert-manager.io \"ingress-tls-mw42g\": StorageError: invalid object, Code: 4, Key: /registry/cert-manager.io/certificaterequests/networking/ingress-tls-mw42g, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 5ca960e8-dfa6-46c6-baf1-e660302e9c5b, UID in object meta: , Internal error occurred: failed calling webhook \"webhook.cert-manager.io\": failed to call webhook: Post \"https://cert-manager-webhook.infra.svc:443/mutate?timeout=10s\": context deadline exceeded]" "key"="networking/ingress-tls-mw42g" 
E0103 15:49:25.265287       1 controller.go:163] cert-manager/controller/certificates-request-manager "msg"="re-queuing item due to error processing" "error"="Internal error occurred: failed calling webhook \"webhook.cert-manager.io\": failed to call webhook: Post \"https://cert-manager-webhook.infra.svc:443/mutate?timeout=10s\": context deadline exceeded" "key"="networking/ingress-tls" 
E0103 15:49:34.970048       1 controller.go:163] cert-manager/controller/certificates-readiness "msg"="re-queuing item due to error processing" "error"="Internal error occurred: failed calling webhook \"webhook.cert-manager.io\": failed to call webhook: Post \"https://cert-manager-webhook.infra.svc:443/mutate?timeout=10s\": context deadline exceeded" "key"="networking/ingress-tls" 
E0103 15:49:35.272585       1 controller.go:163] cert-manager/controller/certificates-request-manager "msg"="re-queuing item due to error processing" "error"="Internal error occurred: failed calling webhook \"webhook.cert-manager.io\": failed to call webhook: Post \"https://cert-manager-webhook.infra.svc:443/mutate?timeout=10s\": context deadline exceeded" "key"="networking/ingress-tls" 
E0103 15:49:44.980302       1 controller.go:163] cert-manager/controller/certificates-readiness "msg"="re-queuing item due to error processing" "error"="Internal error occurred: failed calling webhook \"webhook.cert-manager.io\": failed to call webhook: Post \"https://cert-manager-webhook.infra.svc:443/mutate?timeout=10s\": context deadline exceeded" "key"="networking/ingress-tls" 
E0103 15:49:45.368612       1 controller.go:163] cert-manager/controller/certificates-request-manager "msg"="re-queuing item due to error processing" "error"="Internal error occurred: failed calling webhook \"webhook.cert-manager.io\": failed to call webhook: Post \"https://cert-manager-webhook.infra.svc:443/mutate?timeout=10s\": context deadline exceeded" "key"="networking/ingress-tls" 
E0103 15:49:54.991004       1 controller.go:163] cert-manager/controller/certificates-readiness "msg"="re-queuing item due to error processing" "error"="Internal error occurred: failed calling webhook \"webhook.cert-manager.io\": failed to call webhook: Post \"https://cert-manager-webhook.infra.svc:443/mutate?timeout=10s\": context deadline exceeded" "key"="networking/ingress-tls" 
E0103 15:49:55.465972       1 controller.go:163] cert-manager/controller/certificates-request-manager "msg"="re-queuing item due to error processing" "error"="Internal error occurred: failed calling webhook \"webhook.cert-manager.io\": failed to call webhook: Post \"https://cert-manager-webhook.infra.svc:443/mutate?timeout=10s\": context deadline exceeded" "key"="networking/ingress-tls" 
E0103 15:50:05.001499       1 controller.go:163] cert-manager/controller/certificates-readiness "msg"="re-queuing item due to error processing" "error"="Internal error occurred: failed calling webhook \"webhook.cert-manager.io\": failed to call webhook: Post \"https://cert-manager-webhook.infra.svc:443/mutate?timeout=10s\": context deadline exceeded" "key"="networking/ingress-tls" 
E0103 15:50:05.472988       1 controller.go:163] cert-manager/controller/certificates-request-manager "msg"="re-queuing item due to error processing" "error"="Internal error occurred: failed calling webhook \"webhook.cert-manager.io\": failed to call webhook: Post \"https://cert-manager-webhook.infra.svc:443/mutate?timeout=10s\": context deadline exceeded" "key"="networking/ingress-tls" 
E0103 15:50:15.013017       1 controller.go:163] cert-manager/controller/certificates-readiness "msg"="re-queuing item due to error processing" "error"="Internal error occurred: failed calling webhook \"webhook.cert-manager.io\": failed to call webhook: Post \"https://cert-manager-webhook.infra.svc:443/mutate?timeout=10s\": context deadline exceeded" "key"="networking/ingress-tls" 
E0103 15:50:15.479866       1 controller.go:163] cert-manager/controller/certificates-request-manager "msg"="re-queuing item due to error processing" "error"="Internal error occurred: failed calling webhook \"webhook.cert-manager.io\": failed to call webhook: Post \"https://cert-manager-webhook.infra.svc:443/mutate?timeout=10s\": context deadline exceeded" "key"="networking/ingress-tls" 
E0103 15:50:31.013356       1 controller.go:163] cert-manager/controller/certificates-readiness "msg"="re-queuing item due to error processing" "error"="Internal error occurred: failed calling webhook \"webhook.cert-manager.io\": failed to call webhook: Post \"https://cert-manager-webhook.infra.svc:443/mutate?timeout=10s\": context deadline exceeded" "key"="networking/ingress-tls" 
E0103 15:50:31.482004       1 controller.go:163] cert-manager/controller/certificates-request-manager "msg"="re-queuing item due to error processing" "error"="Internal error occurred: failed calling webhook \"webhook.cert-manager.io\": failed to call webhook: Post \"https://cert-manager-webhook.infra.svc:443/mutate?timeout=10s\": context deadline exceeded" "key"="networking/ingress-tls"

As issue still present i restart manually pod certmanager. Same behavior.

Hereafter log found

E0103 15:56:01.460437       1 controller.go:163] cert-manager/controller/orders "msg"="re-queuing item due to error processing" "error"="ACME client for issuer not initialised/available" "key"="networking/sentinelprocessors-n74ln-1865814251" 
I0103 15:56:01.460786       1 setup.go:202] cert-manager/controller/issuers "msg"="skipping re-verifying ACME account as cached registration details look sufficient" "related_resource_kind"="Secret" "related_resource_name"="letsencrypt-http-prod" "related_resource_namespace"="networking" "resource_kind"="Issuer" "resource_name"="ingress-tls" "resource_namespace"="networking" "resource_version"="v1" 
E0103 15:56:01.460431       1 controller.go:163] cert-manager/controller/orders "msg"="re-queuing item due to error processing" "error"="ACME client for issuer not initialised/available" "key"="networking/sentinelprocessors-s58lv-1865814251" 
E0103 15:56:01.659511       1 controller.go:163] cert-manager/controller/orders "msg"="re-queuing item due to error processing" "error"="ACME client for issuer not initialised/available" "key"="networking/ingress-tls-d4lqq-3316629141" 
E0103 15:56:03.659842       1 controller.go:163] cert-manager/controller/orders "msg"="re-queuing item due to error processing" "error"="ACME client for issuer not initialised/available" "key"="networking/ingress-tls-kppsl-2013388072" 
I0103 15:56:04.659296       1 setup.go:202] cert-manager/controller/issuers "msg"="skipping re-verifying ACME account as cached registration details look sufficient" "related_resource_kind"="Secret" "related_resource_name"="letsencrypt-http-prod" "related_resource_namespace"="networking" "resource_kind"="Issuer" "resource_name"="ingress-tls" "resource_namespace"="networking" "resource_version"="v1" 
I0103 15:56:22.460819       1 trace.go:205] Trace[1262032243]: "Reflector ListAndWatch" name:external/io_k8s_client_go/tools/cache/reflector.go:167 (03-Jan-2023 15:55:44.361) (total time: 38099ms):
Trace[1262032243]: ---"Objects listed" 37899ms (15:56:22.260)
Trace[1262032243]: [38.099703294s] [38.099703294s] END
nleconte-csgroup commented 1 year ago

The renewal was succesfull :

 safescale  gw-cluster-ops  ~  kubectl get certificates ingress-tls -n networking -oyaml
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  creationTimestamp: "2022-09-14T14:24:33Z"
  generation: 7
  labels:
    app.kubernetes.io/instance: apisix
  name: ingress-tls
  namespace: networking
  resourceVersion: "124252183"
  uid: 42f908cc-de41-48b9-8de4-7b711e4cbb3f
spec:
  dnsNames:
  - apisix.platform.ops-csc.com
  - linkerd.platform.ops-csc.com
  - iam.platform.ops-csc.com
  - monitoring.platform.ops-csc.com
  - processing.platform.ops-csc.com
  - security.platform.ops-csc.com
  - uwc.platform.ops-csc.com
  issuerRef:
    kind: Issuer
    name: ingress-tls
  secretName: ingress-tls
  usages:
  - digital signature
  - key encipherment
status:
  conditions:
  - lastTransitionTime: "2023-01-03T15:55:18Z"
    message: Certificate is up to date and has not expired
    observedGeneration: 7
    reason: Ready
    status: "True"
    type: Ready
  notAfter: "2023-04-03T14:55:00Z"
  notBefore: "2023-01-03T14:55:01Z"
  renewalTime: "2023-03-04T14:55:00Z"
  revision: 4

It looks like the secret was not updated :

 safescale  gw-cluster-ops  ~  kubectl get secret ingress-tls --namespace networking --show-managed-fields -o jsonpath='{range .metadata.managedFields[*]}{.manager}{" did "}{.operation}{" at "}{.time}{"\n"}{end}'
controller did Update at 2022-09-28T13:24:05Z

this looks like the same bug as here : https://github.com/cert-manager/cert-manager/issues/2644

So please try to remove the secret ingress-tls and restart the cert manager if the secret is not created automatically.

Note : Please remove the wildcard from the list again that you just added again. It cannot use a wildcard and a subdomain that the wildcard matches.

suberti-ads commented 1 year ago

Thanks, it is working ! To resume:

nleconte-csgroup commented 1 year ago

You're welcome.

1) You were required to delete the secret because the cluster's update changed the default challenge mode from dns01 to http01. It created a misalignment between the deployed certs and the new renewal way method. The http01 method cannot renew a wildcard certificate.

2) Also, during the investigation, I found out the sentinelprocessors secret is not updated as well because it was manually imported later on. The certificate in k8s is updated :

 safescale  gw-cluster-ops  ~  kubectl get certificates -n networking sentinelprocessors -oyaml
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"cert-manager.io/v1","kind":"Certificate","metadata":{"annotations":{},"labels":{"app.kubernetes.io/instance":"apisix"},"name":"sentinelprocessors","namespace":"networking"},"spec":{"dnsNames":["sentinelprocessors.copernicus.eu"],"issuerRef":{"kind":"Issuer","name":"ingress-tls"},"secretName":"sentinelprocessors-tls","usages":["digital signature","key encipherment"]}}
  creationTimestamp: "2022-10-11T09:43:23Z"
  generation: 1
  labels:
    app.kubernetes.io/instance: apisix
  name: sentinelprocessors
  namespace: networking
  resourceVersion: "100918589"
  uid: 8aa64838-f1e2-413f-8a63-86053d9735cd
spec:
  dnsNames:
  - sentinelprocessors.copernicus.eu
  issuerRef:
    kind: Issuer
    name: ingress-tls
  secretName: sentinelprocessors-tls
  usages:
  - digital signature
  - key encipherment
status:
  conditions:
  - lastTransitionTime: "2022-10-11T09:43:39Z"
    message: Certificate is up to date and has not expired
    observedGeneration: 1
    reason: Ready
    status: "True"
    type: Ready
  notAfter: "2023-03-10T07:46:03Z"
  notBefore: "2022-12-10T07:46:04Z"
  renewalTime: "2023-02-08T07:46:03Z"
  revision: 2

But the secret is not updated because if we look at the TLS cert, the dates do not match and are close to expire :

nleconte@PO19728:~$ echo | openssl s_client -servername sentinelprocessors.copernicus.eu -connect sentinelprocessors.copernicus.eu:443  2>/dev/null | openssl x509 -noout -dates
notBefore=Oct 11 08:45:08 2022 GMT
notAfter=Jan  9 08:45:07 2023 GMT

So please try to back up and delete the sentinelprocessors-tls secret (and the cert manager pod if the secret is not created automatically).

suberti-ads commented 1 year ago

This is also working. After secret deletion, new secret automatically generate:

suberti@refsys-client:~$ echo | openssl s_client -servername sentinelprocessors.copernicus.eu -connect sentinelprocessors.copernicus.eu:443  2>/dev/null | openssl x509 -noout -dates
notBefore=Jan  4 08:51:43 2023 GMT
notAfter=Apr  4 08:51:42 2023 GMT

Do you think we will have to force deletion each time or this will be automatically generated near next expiration ?

nleconte-csgroup commented 1 year ago

Do you think we will have to force deletion each time or this will be automatically generated near next expiration ?

No, it should be OK from now on, unless there is a bug. We had to do it this time because the migration from the old to the new OPS cluster created some discrepancies between the configuration in K8S and the TLS certs manually deployed.

LAQU156 commented 1 year ago

IVV_CCB_2023_w01 : Closed (fixed)