canonical / traefik-k8s-operator

https://charmhub.io/traefik-k8s
Apache License 2.0
11 stars 27 forks source link

Subdomain routing mode does not work properly with TLS certificates #244

Open ghislainbourgeois opened 1 year ago

ghislainbourgeois commented 1 year ago

Bug Description

When using Traefik in the subdomain routing mode, all Ingress will use Traefik's default certificate.

CertHandler manages a single certificate for Traefik, and uses the external_hostname as the subject and sans. New Ingresses generated hostnames will not be represented in the certificate, and the SNI matching will not work, thus serving the default certificate.

I see 3 ways of fixing this issue:

  1. Modify CertHandler to support multiple certificates, and when set to the subdomain routing mode, request new certificates for each Ingress.
  2. Drop CertHandler and use the tls library directly to request new certificates for each Ingress when set to the subdomain routing mode.
  3. Keep CertHandler as is and add f"*.{external_hostname}" to the list of sans to request a wildcard certificate. This would have negative security implications however and I do not think it should be implemented; I am only mentioning it for completeness' sake.

To Reproduce

  1. juju add-model potato
  2. juju deploy traefik-k8s --trust --channel edge --config routing_mode=subdomain --config external_hostname=potato.com
  3. juju deploy zinc-k8s --trust --channel edge
  4. juju deploy self-signed-certificates --trust --channel edge
  5. juju relate self-signed-certificates:certificates traefik-k8s:certificate
  6. juju relate zinc-k8s:ingress traefik-k8s:ingress
  7. https://potato-zinc-k8s.potato.com # this will use Traefik's default certificate

Environment

This was tested with every charm from the edge channel, running locally on MicroK8s v1.27.5 revision 5892 with Juju 3.1.5-genericlinux-amd64.

Relevant log output

There are not relevant logs.

Additional context

No response

sed-i commented 1 year ago

On traefik side, we have a matching config in place, https://github.com/canonical/traefik-k8s-operator/blob/4086e3ba3fee02c139919c15b2ca07ca86c68efe/src/charm.py#L1065-L1071

but we do not have a matching cert for that, as we always generate a CSR with traefik's external hostname (config option) or IP address (metallb).

Yes, the CertHandler approach was designed with only path routing in mind.

request new certificates for each Ingress when set to the subdomain routing mode

Seems like the correct approach, but relation count would have to be artificially limited by the charm because the number of SANs per cert is limited.

simskij commented 1 year ago

Seems like the correct approach, but relation count would have to be artificially limited by the charm because the number of SANs per cert is limited.

This would only be an issue if we keep appending SANs to the same certificate. If we instead create one CSR per known subdomain so they all have individual certs, it won't be a concern. That approach, however, potentially opens up for rate-limiting issues with some public CAs like Let's Encrypt (50 certs per registered domain per week)

sed-i commented 1 year ago

potentially opens up for rate-limiting issues with some public CAs like Let's Encrypt (50 certs per registered domain per week)

The rate limit is a similar concern for both approaches:

BTW the max number of SANs per cert depends on the implementation:

No upper bound is defined; implementations are free to choose an upper bound that suits their environment. -- (rfc via SO)

sed-i commented 1 year ago
# Count SANs with:
echo | openssl s_client -showcerts -connect google.com:443 | openssl x509 -noout -text | grep DNS | tr ',' '\n' | wc -l

One cert for all subdomains seems simpler but more restricting:

Different cert per subdomain One cert for all subdomains
Count limit Unlimited (subject to CA rate limit and juju relation count limit) Depends on CA impl'd limit for SANs per cert
Num CSRs New CSR per add subdomain; revocation request per removed subd. New CSR for every added/removed subdomain
Config changed from path to subdomain Delete 1 cert; iterate N relations to create N CSRs -> N certs Create new CSR; overwrite the one cert
Config changed from subdomain to path Delete N certs; created one CSR Create new CSR; overwrite the one cert
Remote unit scale up/down Create/revoke csr; add/delete cert (for IPU, could try to always place all units of the same app in the same cert but if num units > 100 we hit CA count limit) Create new CSR; overwrite the one cert
Future support for custom path/subdomain No issue No issue
Future support for wildcard certs Would simplify everything, with LOTS of code change Would simplify everything, with little code change

Thoughts?

sed-i commented 1 year ago

Another though:

Currently, traefik treats ingress per app and per unit very similarly for subdomain routing:

juju run trfk/0 show-proxied-endpoints --format json | jq -r '."trfk/0".results."proxied-endpoints"' | jq
{
  "prom/0": {
    "url": "http://sbds-prom-0.cluster.local/"
  },
  "am": {
    "url": "http://sbds-am.cluster.local/"
  }
}

If we change things up a bit:

{
  "prom/0": {
    "url": "http://unit-0.prom.sbds.cluster.local/"
  },
  "am": {
    "url": "http://am.sbds.cluster.local/"
  }
}

Then:

Wdyt @simskij @ghislainbourgeois @PietroPasotti @mthaddon?

gregory-schiano commented 1 year ago

FYI I'm facing this issue on the IAM bundle deployment, we need a TLS certificate for the ingress publicly exposing hydra, kratos, and login-ui but the CSR only covers the hostname. This is blocking since I've no way to provide a certificate that'll work for the relations using subdomain routing

gregory-schiano commented 1 year ago
# Count SANs with:
echo | openssl s_client -showcerts -connect google.com:443 | openssl x509 -noout -text | grep DNS | tr ',' '\n' | wc -l
  • Google has 134 subdomains in one cert (and they also use wildcard SANs)
  • Wikipedia has 41 subdomains in one cert (and they also use wildcard SANs)
  • Wordpress and blogger have wildcard SAN.
  • StackExchange projects have wildcard SAN.
  • Government and bank websites have a separate cert per subdomain (and they do not use wildcard SANs)
  • Let's Encrypt limits to 100 SANs per cert (ref)

One cert for all subdomains seems simpler but more restricting:

Different cert per subdomain One cert for all subdomains Count limit Unlimited (subject to CA rate limit and juju relation count limit) Depends on CA impl'd limit for SANs per cert Num CSRs New CSR per add subdomain; revocation request per removed subd. New CSR for every added/removed subdomain Config changed from path to subdomain Delete 1 cert; iterate N relations to create N CSRs -> N certs Create new CSR; overwrite the one cert Config changed from subdomain to path Delete N certs; created one CSR Create new CSR; overwrite the one cert Remote unit scale up/down Create/revoke csr; add/delete cert (for IPU, could try to always place all units of the same app in the same cert but if num units > 100 we hit CA count limit) Create new CSR; overwrite the one cert Future support for custom path/subdomain No issue No issue Future support for wildcard certs Would simplify everything, with LOTS of code change Would simplify everything, with little code change Thoughts?

I would be more in favor of different cert per subdomain or wildcard certificate. One cert for all domain can be tricky if the user chooses the "tls-certificates-operator" that allow providing certificates manually. Any relation added/removed will require the re-issue of the certificate manually.

sed-i commented 1 year ago

Diagram summary of what's currently under consideration: tls

gregory-schiano commented 1 year ago

Small comment about using wildcard certificates. Wildcard certificates can't be validated using HTTP challenge in ACME protocol. We don't have any provider charm that uses HTTP challenge, but this could have consequences

sed-i commented 1 year ago

Per discussion: