Closed ldunkum closed 5 months ago
🤖
Based on the information you've provided, it seems like the handshake failure error you're experiencing when trying to run a workflow with the flyte-binary chart and traefik as an Ingress controller in your EKS cluster could be due to the fact that gRPC requires HTTP/2, and not all Ingress controllers support HTTP/2 for backend services.
In the provided context, the Ingress configuration for Flyte is set up to handle both HTTP and gRPC routes. However, not all Ingress controllers can handle both HTTP 1 and gRPC with a single Ingress because gRPC can only be enabled on the Ingress object, not on backend services. This is mentioned in the comments of the provided context:
# Certain ingress controllers like nginx cannot serve HTTP 1 and GRPC with a single ingress because GRPC can only
# enabled on the ingress object, not on backend services (GRPC annotation is set on the ingress, not on the services).
Traefik, the Ingress controller you're using, supports HTTP/2, but it needs to be enabled explicitly. If it's not enabled, you might experience handshake failure errors when trying to run gRPC workflows.
To resolve this issue, you need to ensure that HTTP/2 is enabled in your Traefik configuration. If you're using Traefik v2.x, you can enable HTTP/2 by adding the following to your Traefik configuration:
entryPoints:
web:
address: ":80"
http:
http2: true
Please adjust the configuration according to your setup. If you're using a different version of Traefik or have a different setup, the configuration might be different. Please refer to the Traefik documentation for more details.
Regarding the insecureSkipVerify
configuration in the Flyte CLI, it is a boolean flag that controls whether a client verifies the server's certificate chain and host name during a gRPC call. If insecureSkipVerify
is set to true, the client will not verify the server's certificate chain and host name, which can be insecure and is not recommended for production use cases. This configuration is particularly useful in development or testing environments where you might be using self-signed certificates. In such cases, you can set insecureSkipVerify
to true to bypass the SSL certificate verification.
I hope this helps! If you have any other questions or need further clarification, feel free to ask.
I'm not sure Flyte supports Traefik, especially due to the absence of some of the annotations required by Flyte. I think @flixr managed to make it work but not with `flyte-binary?
Thanks for the reply @davidmirror-ops!
I was under the impression that some folks managed to get a deployment with Traefik working in the Slack channel.
Can you point to the annotations that Flyte requires that aren't supported by Traefik?
Hey @ldunkum,
we have flyte (flyte-core) running with traefik.
When I set it up, I used the ingress rules that were created, removed the grpc stuff and created a IngressRoute
for the grpc endpoint:
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
name: flyte-core-grpc
namespace: flyte
spec:
entryPoints:
- web
- websecure
routes:
- kind: Rule
match: Host(`flyte.k3s.roboception.de`) && (PathPrefix(`/flyteidl.{service:.*}`) || PathPrefix(`/grpc.health.v1.Health`))
services:
- kind: Service
name: flyteadmin
namespace: flyte
port: grpc
scheme: h2c
Hey @flixr, thanks for your reply. As far as I can tell, we tested basically the same configuration:
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
name: flyte-binary-grpc-ingressroute
spec:
entryPoints:
- web
- websecure
routes:
- kind: Rule
match: Host(`flyte.example.com`) && PathPrefix(`/grpc.health.v1.Health/*`)
services:
- kind: Service
name: flyte-flyte-binary-grpc
namespace: dev-flyte
port: grpc
scheme: h2c
passHostHeader: true #default
As our configuration doesn't work, there is probably something else going on, perhaps our Traefik values are different?
If you spot something, please let me know, the help would be greatly appreciated!
I'm running traefik 2.9.6 atm.. But it looks like your IngressRoute is incomplete, it should also match
PathPrefix(`/flyteidl.{service:.*}`
We're using traefik v2.11.0, and we hadn't looked much into IngressRoutes before, so we simply used different rules for each path. Your solution is much cleaner, so we'll switch to that.
An example for a path config we used:
- kind: Rule
match: Host(`flyte.example.com`) && PathPrefix(`/flyteidl.service.AdminService/*`)
services:
- kind: Service
name: flyte-flyte-binary-grpc
namespace: dev-flyte
port: grpc
scheme: h2c
passHostHeader: true
@ldunkum is Traefik working on your env?
Traefik in general is working great, it's been deployed for a few years, and we have multiple ingresses that are working flawlessly. It's still not working with flyte however, we will look at fixing that during the next two weeks and perhaps try switching to the flyte-core chart.
We actually got this working thanks to this thread in the Traefik support forums. The cipher suites we had were not compatible with the grpc client flyte uses.
@Jeinhaus thanks for confirming. Any chance you could share the final ingress config you used to make it work with Flyte? Just in case others find this thread useful.
@davidmirror-ops yes. I'll try to get some PRs open for this, because it was not only traefik's tlsOptions but also a missing grpc service in the flyte-core
chart.
For now, the tlsOptions we used that worked were:
tlsOptions:
default:
minVersion: VersionTLS12
cipherSuites:
- "TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA"
- "TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA"
- "TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256"
- "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256"
- "TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384"
- "TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305"
- "TLS_CHACHA20_POLY1305_SHA256"
- "TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305"
- "TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256"
- "TLS_FALLBACK_SCSV"
# Important for GRPC, see
# https://community.traefik.io/t/how-to-disable-two-ciphersuites-and-tls1-1-without-breaking-grpc/17647/5
- "TLS_AES_128_GCM_SHA256"
- "TLS_AES_256_GCM_SHA384"
- "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256"
- "TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384"
- "TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384"
# These are neccessary for Win7 users with IE11
- "TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA"
- "TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA"
sniStrict: true
curvePreferences: []
The important part was curvePreferences: []
. Without that, the grpc calls failed.
Is there something that can be done on flyte's side to not require this @davidmirror-ops ?
Something of note for anyone that comes across this in the future:
Traefik v3 removed regex matching from PathPrefix
, therefore we switched to using PathRegexp
.
Describe the bug
We're using traefik as an Ingress controller in our EKS cluster and are deploying the flyte-binary chart. The flyte-binary-grpc service has the annotation
traefik.ingress.kubernetes.io/service.serversscheme: h2c
, which according to traefik docs should be fine to serve GRPC traffic.Testing the GRPC endpoints with curl works fine, e.g.:
curl -v -X POST --http2 'https://flyte.example.com/grpc.health.v1.Health' -d "" -H 'Content-Type: application/grpc' -H 'Accept: application/grpc'
Trying to run a workflow fails with the following error message:
However, I can create a project from the CLI:
Expected behavior
I'd expect the CLI calls to work without error.
Additional context to reproduce
Screenshots
No response
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?