Open ghislainbourgeois opened 1 year ago
On traefik side, we have a matching config in place, https://github.com/canonical/traefik-k8s-operator/blob/4086e3ba3fee02c139919c15b2ca07ca86c68efe/src/charm.py#L1065-L1071
but we do not have a matching cert for that, as we always generate a CSR with traefik's external hostname (config option) or IP address (metallb).
Yes, the CertHandler approach was designed with only path routing in mind.
request new certificates for each Ingress when set to the subdomain routing mode
Seems like the correct approach, but relation count would have to be artificially limited by the charm because the number of SANs per cert is limited.
Seems like the correct approach, but relation count would have to be artificially limited by the charm because the number of SANs per cert is limited.
This would only be an issue if we keep appending SANs to the same certificate. If we instead create one CSR per known subdomain so they all have individual certs, it won't be a concern. That approach, however, potentially opens up for rate-limiting issues with some public CAs like Let's Encrypt (50 certs per registered domain per week)
potentially opens up for rate-limiting issues with some public CAs like Let's Encrypt (50 certs per registered domain per week)
The rate limit is a similar concern for both approaches:
BTW the max number of SANs per cert depends on the implementation:
No upper bound is defined; implementations are free to choose an upper bound that suits their environment. -- (rfc via SO)
# Count SANs with:
echo | openssl s_client -showcerts -connect google.com:443 | openssl x509 -noout -text | grep DNS | tr ',' '\n' | wc -l
One cert for all subdomains seems simpler but more restricting:
Different cert per subdomain | One cert for all subdomains | |
---|---|---|
Count limit | Unlimited (subject to CA rate limit and juju relation count limit) | Depends on CA impl'd limit for SANs per cert |
Num CSRs | New CSR per add subdomain; revocation request per removed subd. | New CSR for every added/removed subdomain |
Config changed from path to subdomain | Delete 1 cert; iterate N relations to create N CSRs -> N certs | Create new CSR; overwrite the one cert |
Config changed from subdomain to path | Delete N certs; created one CSR | Create new CSR; overwrite the one cert |
Remote unit scale up/down | Create/revoke csr; add/delete cert (for IPU, could try to always place all units of the same app in the same cert but if num units > 100 we hit CA count limit) | Create new CSR; overwrite the one cert |
Future support for custom path/subdomain | No issue | No issue |
Future support for wildcard certs | Would simplify everything, with LOTS of code change | Would simplify everything, with little code change |
Thoughts?
Another though:
Currently, traefik treats ingress per app and per unit very similarly for subdomain routing:
juju run trfk/0 show-proxied-endpoints --format json | jq -r '."trfk/0".results."proxied-endpoints"' | jq
{
"prom/0": {
"url": "http://sbds-prom-0.cluster.local/"
},
"am": {
"url": "http://sbds-am.cluster.local/"
}
}
If we change things up a bit:
{
"prom/0": {
"url": "http://unit-0.prom.sbds.cluster.local/"
},
"am": {
"url": "http://am.sbds.cluster.local/"
}
}
Then:
*.prom.sbds.cluster.local
Wdyt @simskij @ghislainbourgeois @PietroPasotti @mthaddon?
FYI I'm facing this issue on the IAM bundle deployment, we need a TLS certificate for the ingress publicly exposing hydra, kratos, and login-ui but the CSR only covers the hostname. This is blocking since I've no way to provide a certificate that'll work for the relations using subdomain routing
# Count SANs with: echo | openssl s_client -showcerts -connect google.com:443 | openssl x509 -noout -text | grep DNS | tr ',' '\n' | wc -l
- Google has 134 subdomains in one cert (and they also use wildcard SANs)
- Wikipedia has 41 subdomains in one cert (and they also use wildcard SANs)
- Wordpress and blogger have wildcard SAN.
- StackExchange projects have wildcard SAN.
- Government and bank websites have a separate cert per subdomain (and they do not use wildcard SANs)
- Let's Encrypt limits to 100 SANs per cert (ref)
One cert for all subdomains seems simpler but more restricting:
Different cert per subdomain One cert for all subdomains Count limit Unlimited (subject to CA rate limit and juju relation count limit) Depends on CA impl'd limit for SANs per cert Num CSRs New CSR per add subdomain; revocation request per removed subd. New CSR for every added/removed subdomain Config changed from path to subdomain Delete 1 cert; iterate N relations to create N CSRs -> N certs Create new CSR; overwrite the one cert Config changed from subdomain to path Delete N certs; created one CSR Create new CSR; overwrite the one cert Remote unit scale up/down Create/revoke csr; add/delete cert (for IPU, could try to always place all units of the same app in the same cert but if num units > 100 we hit CA count limit) Create new CSR; overwrite the one cert Future support for custom path/subdomain No issue No issue Future support for wildcard certs Would simplify everything, with LOTS of code change Would simplify everything, with little code change Thoughts?
I would be more in favor of different cert per subdomain or wildcard certificate. One cert for all domain can be tricky if the user chooses the "tls-certificates-operator" that allow providing certificates manually. Any relation added/removed will require the re-issue of the certificate manually.
Diagram summary of what's currently under consideration:
Small comment about using wildcard certificates. Wildcard certificates can't be validated using HTTP challenge in ACME protocol. We don't have any provider charm that uses HTTP challenge, but this could have consequences
Per discussion:
Bug Description
When using Traefik in the
subdomain
routing mode, all Ingress will use Traefik's default certificate.CertHandler
manages a single certificate for Traefik, and uses theexternal_hostname
as the subject andsans
. New Ingresses generated hostnames will not be represented in the certificate, and the SNI matching will not work, thus serving the default certificate.I see 3 ways of fixing this issue:
CertHandler
to support multiple certificates, and when set to thesubdomain
routing mode, request new certificates for each Ingress.CertHandler
and use the tls library directly to request new certificates for each Ingress when set to thesubdomain
routing mode.CertHandler
as is and addf"*.{external_hostname}"
to the list of sans to request a wildcard certificate. This would have negative security implications however and I do not think it should be implemented; I am only mentioning it for completeness' sake.To Reproduce
juju deploy traefik-k8s --trust --channel edge --config routing_mode=subdomain --config external_hostname=potato.com
juju deploy zinc-k8s --trust --channel edge
juju deploy self-signed-certificates --trust --channel edge
juju relate self-signed-certificates:certificates traefik-k8s:certificate
juju relate zinc-k8s:ingress traefik-k8s:ingress
Environment
This was tested with every charm from the edge channel, running locally on MicroK8s v1.27.5 revision 5892 with Juju 3.1.5-genericlinux-amd64.
Relevant log output
Additional context
No response