Closed ruimarinho closed 1 year ago
Hello Rui! Thanks for such a detailed and an actionable report 🙌
Most of the issues you've listed should be addressed by the upcoming PR https://github.com/cirruslabs/orchard/pull/90, expect for the SNI, which is something we need to think about on how to implement properly to support cases where the FQDN (or an IP-address) of an Orchard Controller is not evident/static at the time of creation of the certificate.
@ruimarinho please check out the new 0.8.0 release, it includes the fixes for the issues you've mentioned.
Yep, just saw it! Thank you! I’ll provide feedback next week as I’m away this week.
@edigaryev just tried 0.8.0 and I'm still getting the same issue, with and without --no-pki
:
❯ orchard context create --name example-0-8-0 --service-account-name bootstrap-admin --service-account-token <token> https://orchard.example.internal:443
❯ orchard list vms
2023/06/21 20:21:32 API client failed to make a request: Get "https://orchard.example.internal:443/v1/vms": tls: failed to verify certificate: x509: certificate is valid for ingress.local, not orchard-controller
With --no-pki
:
❯ orchard context create --name example-0-8-0-no-pki --service-account-name bootstrap-admin --service-account-token <token> https://orchard.example.internal:443 --no-pki
The authencity of controller orchard.example.internal:443 cannot be established.
Certificate SHA-256 fingerprint is 2B FD [...].
Are you sure you want to establish trust to this certificate? (yes/no) yes
2023/06/21 20:21:47 API client failed to make a request: Get "https://orchard.example.internal:443/v1//": tls: failed to verify certificate: x509: certificate is valid for ingress.local, not orchard-controller
Any idea what I could be doing wrong?
Just figured out that orchard context create
does not automatically switch contexts, so the first command (which had returned successfully) actually works if switching to that context. 👌
For future reference, that's simply orchard context default <new-context>
.
@edigaryev it seems like worker.go
doesn't perform SNI parsing from the host.
/opt/homebrew/bin/orchard worker run https://orchard.example.internal:443
{"level":"warn","ts":1687429163.497738,"caller":"worker/worker.go:109","msg":"failed to register worker: API client failed to make a request: Post \"https://orchard.example.internal:443/v1/workers\": tls: failed to verify certificate: x509: certificate is valid for ingress.local, not orchard-controller"}
2023/06/22 11:19:23 failed to register worker on the controller
Could you please have a look?
Just to add more debugging information, it looks like the bootstrap-token command also breaks when no certificate is presented:
❯ orchard get bootstrap-token ci
2023/06/22 11:40:24 failed to create bootstrap token: empty certificate
Since the bootstrap-token contains the certificate as well, I've tried removing it from the token itself (i.e. calling /opt/homebrew/bin/orchard worker run https://orchard.example.internal:443 --bootstrap-token <token>
where token does not contain the certificate component).
This yields:
2023/06/22 11:41:30 invalid bootstrap token format: missing certificate
Hello @ruimarinho 👋
Thanks for reporting this! I've totally overlooked the fact that we also need to to add the certificate-less support for bootstrap tokens.
Should be fixed as a part of https://github.com/cirruslabs/orchard/pull/93.
Hi,
First of all, thanks for open sourcing this project! I'd like to provide some feedback on the current implementation of TLS to understand if there is a better path going forward in terms of how it is architected today.
As you may already know, on a Kubernetes deployment, the ingress typically performs the TLS termination and then proxies traffic to the service, which could either by to a HTTP or HTTPS backend.
The first problem is that TLSv1.3 is being required exclusively, requiring changes to the typical nginx ingress controller configuration (the annotation won't work due to other requirements):
With this change, the nginx controller is able to reach the HTTPS backend with the self-signed certificate created by orchard.
Recommendation: TLSv1.2 should also be supported to avoid custom server snippets.
The approach taken by the team to have
orchard
server generate a self-signed certificate is fine, but unfortunately theorchard
client is forcing validation of a single certificate. This makes it very difficult - if not impossible - to have the kubernetes nginx controller become responsible for the orchard client TLS termination, since a proper setup will likely include a chain of the Root CA and/or Intermediate CA(s). In my case, the edge TLS certificate (e.g. orchard-controller.example.internal) and an Intermediate CA are being bundled by the nginx controller, breaking theorchard
client.Recommendation: validate the presented certificate or certificated chain against locally-trusted Root CAs.
The client has an hard-coded SNI check for orchard-controller.. This makes it incompatible with custom internal domains where, for example, the aforementioned nginx controller is responsible for handling TLS. It also makes it impossible to implement a workaround using nginx's HTTPS passthrough (which isn't ideal anyway).
Recommendation: do not forcefully use a hard-coded SNI and instead use whatever hostname is passed via
orchard context create [...] https://<hostname>:<port>
.Although the previous limitations already make it almost impossible to use custom certificates, there is also another bug which prevents a custom certificate private key to be loaded. Both parameters point to the public key, even if the private is passed in.
Please note that TLS config does not seem to be consistently used across all calls, which makes the behaviour differ depending on the type of call
orchard
command being executed.Lastly, for others attempting to run this on Kubernetes, the best option so far is to publish a LoadBalancer service type directly pointing to the
orchard
controller. The only downside is the constant spam from cloud providers' use of TLSv1.2 for health checks on the origin, whichorchard
rejects.