Closed Abuelodelanada closed 5 months ago
Hitting this too now.
Repro steps:
juju deploy traefik-k8s --channel edge traefik
juju deploy self-signed-certificates ssc
juju relate traefik-k8s:certificates ssc
jhack eval traefik/0 self._get_certs()
# None, None, None
I suspect traefik is looking up some data in secrets, but the remote end is publishing it via relation data, in the clear.
Was trying to reproduce with what I currently have installed: Juju 3.4.2 on 8cpu16gb multipass vm. 3/3 attempts went totally fine. Seems like a code ordering issue or stable vs edge for self-signed-certificates.
bundle: kubernetes
applications:
prom:
charm: prometheus-k8s
channel: latest/edge
revision: 182
scale: 1
trust: true
ssc:
charm: self-signed-certificates
channel: latest/edge
revision: 137
scale: 1
trfk:
charm: traefik-k8s
channel: latest/edge
revision: 184
scale: 1
trust: true
relations:
- - prom:ingress
- trfk:ingress-per-unit
- - ssc:certificates
- trfk:certificates
I'm seeing
INFO unit.trfk/0.juju-log Creating CSR for 10.43.8.188 with DNS [] and IPs ['10.43.8.188']
and prometheus is reachable:
$ curl -k 10.43.8.188/rwdrop-prom-0/api/v1/targets
{"status":"success","data":{"activeTargets": ...}}
Also, traefik-k8s from commit 5a1a160 doesn't have TLSNotEnabledError
anywhere.
But end-to-end TLS seems to be broken - curl https works from trfk container but not from outside:
relations:
- - prom:ingress
- trfk:ingress-per-unit
- - ssc:certificates
- trfk:certificates
- - ssc:certificates
- prom:certificates
- - ssc:send-ca-cert
- trfk:receive-ca-cert
$ juju ssh --container traefik trfk/0 curl https://prom-0.prom-endpoints.rwdrop.svc.cluster.local:9090/api/v1/targets
{"status":"success","data":{"activeTargets":...}}
$ curl -v -L -k https://10.43.8.188/rwdrop-prom-0/api/v1/targets
* Trying 10.43.8.188:443...
* Connected to 10.43.8.188 (10.43.8.188) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* TLSv1.0 (OUT), TLS header, Certificate Status (22):
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS header, Unknown (21):
* TLSv1.3 (IN), TLS alert, unrecognized name (624):
* error:0A000458:SSL routines::tlsv1 unrecognized name
* Closing connection 0
curl: (35) error:0A000458:SSL routines::tlsv1 unrecognized name
And trfk itself complains:
2024-05-01T04:31:13.051Z [traefik] time="2024-05-01T04:31:13Z" level=debug msg="Serving default certificate for request: \"\""
2024-05-01T04:31:13.051Z [traefik] time="2024-05-01T04:31:13Z" level=debug msg="http: TLS handshake error from 10.43.8.188:41134: tls: no certificates configured"
...because certs config and certs are missing from /etc/traefik
. Seems like another potential code ordering problem.
... and they're not there because they're not being pushed as _get_certs returns None,None,None. I think it's the same issue
I tried again after redeploying ssc from stable and traefik from edge; it seems that the data is in certificates relation data, and some of it gets transferred into a secret for traefik-internal usage, but not all of it. traefik stores a private key but not the ca cert or server cert.
This should be fixed by the certhandler 1.7
This affects rev<186, including latest/stable
at time of writing (rev180) as reported by SolQA. I've confirmed latest/stable
has this issue, and that latest/candidate
(rev191) is not affected, so I think the best resolution here is to get candidate promoted to stable
Resolved by this promotion of candidate (rev191) to stable
Bug Description
Traefik ends up in error state when deploying COS-Lite bundle using the TLS overlay.
This issue seems it related to https://github.com/canonical/traefik-k8s-operator/issues/330
To Reproduce
juju deploy cos-lite --channel=edge --trust --overlay ./tls-overlay.yaml --overlay ./offers-overlay.yaml
error
state:App Version Status Scale Charm Channel Rev Address Exposed Message alertmanager 0.27.0 active 1 alertmanager-k8s latest/edge 109 10.152.183.146 no
ca active 1 self-signed-certificates latest/edge 135 10.152.183.30 no
catalogue active 1 catalogue-k8s latest/edge 38 10.152.183.67 no
grafana 9.5.3 active 1 grafana-k8s latest/edge 111 10.152.183.117 no
loki 2.9.5 active 1 loki-k8s latest/edge 135 10.152.183.142 no
prometheus 2.50.1 active 1 prometheus-k8s latest/edge 176 10.152.183.150 no
traefik v2.11.0 waiting 1 traefik-k8s latest/edge 180 192.168.1.250 no installing agent
Unit Workload Agent Address Ports Message alertmanager/0 active idle 10.1.165.27
ca/0 active idle 10.1.165.26
catalogue/0 active idle 10.1.165.37
grafana/0 active idle 10.1.165.41
loki/0 active idle 10.1.165.32
prometheus/0 active idle 10.1.165.23
traefik/0* error idle 10.1.165.48 hook failed: "certificates-relation-changed" for ca:certificates
Model Controller Cloud/Region Version SLA Timestamp cos microk8s microk8s/localhost 3.4.2 unsupported 12:23:22-03:00
App Version Status Scale Charm Channel Rev Address Exposed Message alertmanager 0.27.0 active 1 alertmanager-k8s latest/edge 109 10.152.183.146 no
ca active 1 self-signed-certificates latest/edge 135 10.152.183.30 no
catalogue active 1 catalogue-k8s latest/edge 38 10.152.183.67 no
grafana 9.5.3 active 1 grafana-k8s latest/edge 111 10.152.183.117 no
loki 2.9.5 active 1 loki-k8s latest/edge 135 10.152.183.142 no
prometheus 2.50.1 active 1 prometheus-k8s latest/edge 176 10.152.183.150 no
traefik v2.11.0 active 1 traefik-k8s latest/edge 180 192.168.1.250 no
Unit Workload Agent Address Ports Message alertmanager/0 active idle 10.1.165.27
ca/0 active idle 10.1.165.26
catalogue/0 active idle 10.1.165.37
grafana/0 active idle 10.1.165.41
loki/0 active idle 10.1.165.32
prometheus/0 active idle 10.1.165.23
traefik/0* active idle 10.1.165.48
Additional context
No response