canonical / istio-operators

Charmed Istio
2 stars 17 forks source link

Adding self-signed-certificates and removing the relation breaks the charm #441

Closed dparv closed 5 days ago

dparv commented 2 weeks ago

Bug Description

istio-pilot/0*                error        idle   10.244.2.10                 hook failed: "certificates-relation-broken"

and can't access kubeflow dashbboard

To Reproduce

juju deploy self-signed-certificates --channel edge juju relate istio-pilot:certificates self-signed-certificates:certificates and juju remove-relation istio-pilot:certificates self-signed-certificates:certificates

Environment

juju 3.4.3 istio-pilot 1.17/stable 965 self-signed-certificates latest/edge 145

Relevant Log Output

unit-istio-pilot-0: 13:52:26 WARNING unit.istio-pilot/0.juju-log certificates:56: 'app' expected but not received.
unit-istio-pilot-0: 13:52:26 WARNING unit.istio-pilot/0.juju-log certificates:56: 'app_name' expected in snapshot but not found.
unit-istio-pilot-0: 13:52:26 INFO unit.istio-pilot/0.juju-log certificates:56: Creating CSR for 57.152.89.25 with DNS ['istio-pilot-0.istio-pilot-endpoints.kubeflow.svc.cluster.local'] and IPs []
unit-istio-pilot-0: 13:52:26 ERROR unit.istio-pilot/0.juju-log certificates:56: Uncaught exception while in charm code:
Traceback (most recent call last):
  File "./src/charm.py", line 1203, in <module>
    main(Operator)
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/main.py", line 544, in main
    manager.run()
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/main.py", line 520, in run
    self._emit()
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/main.py", line 509, in _emit
    _emit_charm_event(self.charm, self.dispatcher.event_name)
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/main.py", line 143, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/framework.py", line 350, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/framework.py", line 849, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/framework.py", line 939, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/lib/charms/tls_certificates_interface/v2/tls_certificates.py", line 1582, in _on_relation_broken
    self.on.all_certificates_invalidated.emit()
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/framework.py", line 350, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/framework.py", line 849, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/framework.py", line 939, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/lib/charms/observability_libs/v0/cert_handler.py", line 420, in _on_all_certificates_invalidated
    self._generate_csr(overwrite=True, clear_cert=True)
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/lib/charms/observability_libs/v0/cert_handler.py", line 272, in _generate_csr
    self.certificates.request_certificate_creation(certificate_signing_request=csr)
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/lib/charms/tls_certificates_interface/v2/tls_certificates.py", line 1421, in request_certificate_creation
    raise RuntimeError(
RuntimeError: Relation certificates does not exist - The certificate request can't be completed


### Additional Context

_No response_
syncronize-issues-to-jira[bot] commented 2 weeks ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5876.

This message was autogenerated

DnPlas commented 1 week ago

Reported issue

I was able to reproduce the issue.

My model:

Model      Controller  Cloud/Region        Version  SLA          Timestamp
istio-441  uk8s-343    microk8s/localhost  3.4.3    unsupported  18:54:00Z

App                       Version  Status   Scale  Charm                     Channel       Rev  Address         Exposed  Message
istio-ingressgateway               active       1  istio-gateway             1.17/stable  1000  10.152.183.212  no
istio-pilot                        waiting      1  istio-pilot               1.17/stable   965  10.152.183.210  no       installing agent
self-signed-certificates           active       1  self-signed-certificates  latest/edge   147  10.152.183.76   no

Unit                         Workload  Agent  Address      Ports  Message
istio-ingressgateway/0*      active    idle   10.1.60.140
istio-pilot/0*               error     idle   10.1.60.137         hook failed: "certificates-relation-broken" for self-signed-certificates:certificates
self-signed-certificates/0*  active    idle   10.1.60.138

Integration provider                   Requirer                          Interface          Type     Message
istio-pilot:istio-pilot                istio-ingressgateway:istio-pilot  k8s-service        regular
istio-pilot:peers                      istio-pilot:peers                 istio_pilot_peers  peer
self-signed-certificates:certificates  istio-pilot:certificates          tls-certificates   regular

juju debug-log output:

unit-istio-pilot-0: 18:53:15 INFO unit.istio-pilot/0.juju-log certificates:1: Creating CSR for 10.64.140.43 with DNS ['istio-pilot-0.istio-pilot-endpoints.istio-441.svc.cluster.local'] and IPs []
unit-istio-pilot-0: 18:53:15 ERROR unit.istio-pilot/0.juju-log certificates:1: Uncaught exception while in charm code:
Traceback (most recent call last):
  File "./src/charm.py", line 1203, in <module>
    main(Operator)
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/main.py", line 544, in main
    manager.run()
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/main.py", line 520, in run
    self._emit()
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/main.py", line 509, in _emit
    _emit_charm_event(self.charm, self.dispatcher.event_name)
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/main.py", line 143, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/framework.py", line 350, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/framework.py", line 849, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/framework.py", line 939, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/lib/charms/tls_certificates_interface/v2/tls_certificates.py", line 1582, in _on_relation_broken
    self.on.all_certificates_invalidated.emit()
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/framework.py", line 350, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/framework.py", line 849, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/framework.py", line 939, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/lib/charms/observability_libs/v0/cert_handler.py", line 420, in _on_all_certificates_invalidated
    self._generate_csr(overwrite=True, clear_cert=True)
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/lib/charms/observability_libs/v0/cert_handler.py", line 272, in _generate_csr
    self.certificates.request_certificate_creation(certificate_signing_request=csr)
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/lib/charms/tls_certificates_interface/v2/tls_certificates.py", line 1421, in request_certificate_creation
    raise RuntimeError(
RuntimeError: Relation certificates does not exist - The certificate request can't be completed

Potential cause

The error message comes from the cert_handler library. At first glance it looks like on a relation_broken event, a all_certificates_invalidated event is emitted by the tls_certificates library (which is used under the hood by the cert handler lib). The cert_handler lib then calls _on_all_certificates_invalidated which tries to generate a CRS, but since the relation is not established anymore, generating the CSR will fail.

I have pinged the maintainers of the library I'm referring to, will come back with an update.

State of TLS certificates integration

Just as a quick check, I did the following to ensure the TLS certificates were in fact passed and rendered correctly in the Gateway and Secret objects:

  1. Deploy istio-operators 1.17/stable
  2. Deploy self-signed-certificates latest/edge
  3. Add relations
  4. Checked the Gateway object and the Secret it references

My model:

Model      Controller  Cloud/Region        Version  SLA          Timestamp
istio-441  uk8s-343    microk8s/localhost  3.4.3    unsupported  18:49:30Z

App                       Version  Status  Scale  Charm                     Channel       Rev  Address         Exposed  Message
istio-ingressgateway               active      1  istio-gateway             1.17/stable  1000  10.152.183.212  no
istio-pilot                        active      1  istio-pilot               1.17/stable   965  10.152.183.210  no
self-signed-certificates           active      1  self-signed-certificates  latest/edge   147  10.152.183.76   no

Unit                         Workload  Agent  Address      Ports  Message
istio-ingressgateway/0*      active    idle   10.1.60.140
istio-pilot/0*               active    idle   10.1.60.137
self-signed-certificates/0*  active    idle   10.1.60.138

Integration provider                   Requirer                          Interface          Type     Message
istio-pilot:istio-pilot                istio-ingressgateway:istio-pilot  k8s-service        regular
istio-pilot:peers                      istio-pilot:peers                 istio_pilot_peers  peer
self-signed-certificates:certificates  istio-pilot:certificates          tls-certificates   regular

The Gateway and Secret objects:

apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  creationTimestamp: "2024-06-17T18:44:40Z"
  generation: 2
  labels:
    app.juju.is/created-by: istio-pilot
    app.kubernetes.io/instance: istio-pilot-istio-441
    kubernetes-resource-handler-scope: gateway
  name: istio-gateway
  namespace: istio-441
  resourceVersion: "1420"
  uid: f56edf34-5dd7-4d5a-b50c-1e6b7f977e89
spec:
  selector:
    istio: ingressgateway
  servers:
  - hosts:
    - '*'
    port:
      name: https
      number: 8443
      protocol: HTTPS
    tls: # <--- it is configured for TLS
      credentialName: istio-gateway-gateway-secret # <--- it references this secret
      mode: SIMPLE
apiVersion: v1
data:
  tls.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURyakNDQXBhZ0F3SUJBZ0lVWlV3L0x0aWs2MjNic3FSWWNyRlBjMUJwYjhNd0RRWUpLb1pJaHZjTkFRRUwKQlFBd09URUxNQWtHQTFVRUJoTUNWVk14S2pBb0JnTlZCQU1NSVhObGJHWXRjMmxuYm1Wa0xXTmxjblJwWm1sagpZWFJsY3kxdmNHVnlZWFJ2Y2pBZUZ3MHlOREEyTVRjeE9EUTFNRFJhRncweU5UQTJNVGN4T0RRMU1EUmFNRWN4CkZqQVVCZ05WQkFNTURXbHpkR2x2TFhCcGJHOTBMVEF4TFRBckJnTlZCQzBNSkRGaU4yRXlaRGhsTFdVNU4yVXQKTkRBd09TMDRPVGN5TFRRMU1HRXdZbVUxT1RnME1EQ0NBU0l3RFFZSktvWklodmNOQVFFQkJRQURnZ0VQQURDQwpBUW9DZ2dFQkFNOU1yS1VkZXRJOGJMeFo0Mi9VY2FXaGtKVEpzT0IwRVRxTzlENUxNSUdtZXI1d3ZLc1dmc2Q4CmMxOHV2bUtnc2pCM2tZVVV0bDNIa0xxdHlwU1ZXNkZyOUVPaWI2TGVadFFSTmFYZm11RFN1UjBqMk9jRTJzem8KdDVwRDM3MFJOTVB2eG9BT0szN3U3dkM2VjRaL2ZudnFPaWlaVDZjaU5UQjJSWmpzYTVoWjdSUHZSOW5WaXRhLwpoODZhQmkxdThaNDFpUlhTZkxlTUxDNFdYcEhwL2x2a0JRVVNwWUIyRGs0VDF0Mm90cjNhbjEzbGdMYWtmdk5XCmJYWmpzRWxYWVFCeEhHQmYzN0oraUhjOU9YM25ybnVVY3o2SGgzbG9WekpROGIwTktvMTlDbFZWbTdtbThWVjIKMnIvcVR0VXQ2dDViWE12QUQwRUFtQWhHNFEvL3dXOENBd0VBQWFPQm56Q0JuREFoQmdOVkhTTUVHakFZZ0JZRQpGSVo1ZkVqeUowNEM5U1IrbW80WWRvRWlnRkFzTUIwR0ExVWREZ1FXQkJUL0RFZmxvbTdUNGF6VUpmNXl6L09GCkpNS05KVEFNQmdOVkhSTUJBZjhFQWpBQU1Fb0dBMVVkRVFSRE1FR0NQMmx6ZEdsdkxYQnBiRzkwTFRBdWFYTjAKYVc4dGNHbHNiM1F0Wlc1a2NHOXBiblJ6TG1semRHbHZMVFEwTVM1emRtTXVZMngxYzNSbGNpNXNiMk5oYkRBTgpCZ2txaGtpRzl3MEJBUXNGQUFPQ0FRRUFkdzc5UWhJN0pVcUV3MzRwZysrRkJDSitKVUU3OFpZVURrVHNIQWVZClZEUlpWcUwyaENnL0poU2k3RHFrYUcwRjh6UkdadGxzcUFCdEdEYmhPZC9WM3BiOUtYVTRUSHl6UWhPYmlEWHkKYXRwQ2REUnEwUDVUeGpBT2l6YnJIZHlyOXc2c0FFd1VEcldKclQ2NjFOVjFNazE3YUluTVZZdFlNMExsS0h5YgpkTGZ4NmZjcGVCeXJXVjQ2cjZLTVlKQWoyd2lORjhlSXdpK0NMd2tiUGwwR1FHd3lVK3NSV1EwVmtuWk5ESVlyCjhzT0wrbUV5VVNBNmJ3RmF3dFVxUHJPRDI5RXJ5VlF0RkVtYit6cWVuN2VUNXBLN2FRRDE2NDR2TEdEajJEYzkKNkVIM1pRZ2VjUHBFcHJWTW04NTdEWC9XTTdDcDQxUThURDF5SUVGREZCSThSdz09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0=
  tls.key: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFcFFJQkFBS0NBUUVBejB5c3BSMTYwanhzdkZuamI5UnhwYUdRbE1tdzRIUVJPbzcwUGtzd2dhWjZ2bkM4CnF4Wit4M3h6WHk2K1lxQ3lNSGVSaFJTMlhjZVF1cTNLbEpWYm9XdjBRNkp2b3Q1bTFCRTFwZCthNE5LNUhTUFkKNXdUYXpPaTNta1BmdlJFMHcrL0dnQTRyZnU3dThMcFhobjkrZStvNktKbFBweUkxTUhaRm1PeHJtRm50RSs5SAoyZFdLMXIrSHpwb0dMVzd4bmpXSkZkSjh0NHdzTGhaZWtlbitXK1FGQlJLbGdIWU9UaFBXM2FpMnZkcWZYZVdBCnRxUis4MVp0ZG1Pd1NWZGhBSEVjWUYvZnNuNklkejA1ZmVldWU1UnpQb2VIZVdoWE1sRHh2UTBxalgwS1ZWV2IKdWFieFZYYmF2K3BPMVMzcTNsdGN5OEFQUVFDWUNFYmhELy9CYndJREFRQUJBb0lCQVFDZ0RXc2U4T3ZyZG92ZAp3T2xCWnAxNGJJM2Mwdnlsei9lZFp0SmRabUJGT2V4N0xULytPSmdhSFpSV1lSak52WlRXcHZyTDdYb0FYaHo0CmhVWnNBZ1dGVkh4NzIrYWxzV0ZqU3daSTA2UVpBWm03VGZvaUpEVnJFQ0x5RUlXbXpLb1l2Z0JjenBQMnBUUUcKMlZqS2w1Vm94eWV3UU82bTlGcHMyR1JUOWZYODRjMS90bUkraTZOOVN1ak5wcDloVzhoMS81cjBaRnZha2VJSgpxaTgyb2IyMFhWQmhOeGg0Z0x1YWw2aGtiTE9WUWVmNWZSTThxOFRVWlVGTjRycEpvbi9XaHNBTnMxRkpaVE5kCngza2JHZWh0TEc0OXNzSzdqVG5KK0ZFeEZSM0pjZjBieHJKV1JEbkJFN0JCMGxzSmR4anVUTFhCVlZ1dGxHazMKVDNENWtQckJBb0dCQVBDelNiWDVuTnl1NjZaVmNLcTB4a3plNVhiaUR4Ri95aW1yaUtybDRHU1czSWx5bnNBdQprVG5rVGtUVDB1YithSzJGUlJiQ3hIM3dCMW9HQjlhUG1RaFBtY1BDMGJ1TlJLN3M2MGhnWVV1Y0o3czNPdE5KCk5ubm1JN0laTVpQTURweHVoekFhNHU2NGFBZ21QNWdkNWpVSjJkNGZtZkJrU1YyYzlwRG9xM29OQW9HQkFOeDUKNGhHck5mVk9BTWJCNVpQdTNrTkdzdkhzaURRalA3TWZOVTB2MjRUYW9KS3d1dXR1L1NKUE5FQ3FGbkx1S2tkWgo2cDZ1akN5UC9LNlBvQUlqSjZjMHJGUkRIbHMxQk1PTU5VQlBmdGNVUW5KSHo0Ty93ZUt2U2FWRGsvU0FXWTFICnpkS2toVEdvaG1yZ2tJcnhvaDRsUTViT0YxSnp1WlR3L3ozYXFUWnJBb0dCQU14ZW5qNXBjenVaTmJKa0p5WjYKR1VrWmxHR05iVmZoVmVodG9idmhOTmFUbFNzSzdDbW5JRjIwTUpTVitpTnhiYld2UzBzWkVqY1AvMTM3Y3RwRgowSnpTNFc3cTBxTlpQakQ4TG9Xa2Q5ZjMvWEFqWThvVUJySVhxc1ZFU09rQndJSW9BcGJnclVBZHlRN3FVdUs0CnVFYmVWMk1YRitDWmRnV0xDWHRlWW9KZEFvR0JBTkpVY0VlODFzLzdKeUIxNzRjdUZObUhnOFRwaXBKNm9oVkcKaTNua1V2NHQ5NHVaaitoMFRJYkRtcXlwMXBxei9KOXU5eldFZlBNeU5iTnVEdzZhN1FSRmFyVkVCcHlxT3E0Mgpmc0tvVSsvcFF1NTA5VkhSeUt4eDNzY0xiZ1dOd0dEWWhGRVVaSUNZTGd1ZHlpYlRGMzY4dS9zTkJ4REFsK1d2Cjl6L1I3eVdiQW9HQUx6Qk5WSTh0U2RkbTYwWEhpVFJVL012dzdFOHNXbU1Gdlp2VGRXdmZMcVY3N0V2cE1FdEEKdTJzQmhNdUJHbkVvVjV0ZDR6aVVpb2xDQms2c2UxNi8xWlpydUtvaGJIdFZobXlTTG8xalFtSXlicjB4RFBOYgpOY3k0UHNkNTVhVEZNSVd3RWhwbWovMjB0UmlQaGNadlBEWlVleHBIWDRHYi9mRmZ4QTJlTGc4PQotLS0tLUVORCBSU0EgUFJJVkFURSBLRVktLS0tLQo=
kind: Secret
metadata:
  creationTimestamp: "2024-06-17T18:45:05Z"
  labels:
    app.juju.is/created-by: istio-pilot
    app.kubernetes.io/instance: istio-pilot-istio-441
    kubernetes-resource-handler-scope: gateway
  name: istio-gateway-gateway-secret
  namespace: istio-441
  resourceVersion: "1419"
  uid: 0dfe5ae8-ed60-4f54-8e3f-b3b6abed5e4e
type: kubernetes.io/tls

Based on this we can confirm the relation and the reconciler in the istio-operators seem to be working just fine.

DnPlas commented 1 week ago

I have confirmed with @sed-i that this issue is caused by the cert_library not handling relation broken events correctly. I have tested the fix in https://github.com/canonical/observability-libs/pull/99 and it seems to be working for v0 of the library.

To fix the issue @dparv reported, we'll have to:

  1. Wait for https://github.com/canonical/observability-libs/pull/99 to be merged
  2. Bump the library to bring in all the changes for cert_handler v0

For more recent versions of the istio-operators we'd ideally use the cert_handler v1, which we'll also have to pull once the mentioned PR is merged.

DnPlas commented 1 week ago

I have submitted multiple PRs to bump the cert_handler library as it was recently updated. Thanks @sed-i!

@dparv we'll soon be publishing a new revision of istio-pilot 1.17/stable that includes the newer library version. We'll keep you posted.

DnPlas commented 5 days ago

The fix has been released to 1.17/stable. Closing this issue, but feel free to re-open or file a new one should you find any other error. Thanks!