flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.43k stars 583 forks source link

[BUG] flyte-binary chart: webhook service name isn't respected #4954

Open ldunkum opened 6 months ago

ldunkum commented 6 months ago

Describe the bug

We deployed the flyte-binary chart and are having problems mounting secrets from k8s secrets.

This is the workflow:

secret = Secret(
    group="dotfile-secret",
    key=".secret-file",
)

@task(
  requests=Resources(cpu="100m", mem="500Mi"), limits=Resources(cpu="100m", mem="500Mi"),
  secret_requests=[secret]
)
def greeting_length(greeting: str) -> int:
    """A task the counts the length of a greeting."""
    secret_content = context.secrets.get("dotfile-secret", ".secret-file")
    return len(greeting)

This is the full error message in the flyte UI: Workflow[flytesnacks:development:workflows.example.wf] failed. RuntimeExecutionError: max number of system retry attempts [11/10] exhausted. Last known status message: failed at Node[n1]. RuntimeExecutionError: failed during plugin execution, caused by: failed to execute handle for plugin [container]: [InternalError] failed to create resource, caused by: Internal error occurred: failed calling webhook "flyte-pod-webhook.flyte.org": failed to call webhook: Post "https://flyte-pod-webhook.dev-flyte.svc:443/mutate--v1-pod?timeout=10s": service "flyte-pod-webhook" not found

The service can't be reached, as it doesn't exist under the default name, the service name of the webhook was generated as flyte-flyte-binary-webhook. The configuration should be fine, as can be seen by the config file inside the container:

#/etc/flyte/config.d/000-core.yaml
webhook:
  serviceName: flyte-flyte-binary-webhook
flyte:
  propeller:
    disableWebhook: false
    disabled: false

It seems as if the service name from the config file isn't picked up, and instead the default value is used.

Expected behavior

We expect flyte to respect the configured name.

Additional context to reproduce

  1. Set flyte-binary.fullname to anything besides the default
  2. Try to mount secrets in a Task and execute said task

Screenshots

No response

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

welcome[bot] commented 6 months ago

Thank you for opening your first issue here! 🛠

dosubot[bot] commented 6 months ago

🤖

Hey @ldunkum! Great to see you diving deep into Flyte again. Hope you're doing well!

Thank you for providing detailed information about the issue you're facing. It seems like the webhook service name isn't being respected, which is causing the error when trying to mount secrets from k8s secrets.

Based on the context provided, a similar issue was previously reported and resolved in the Flyte repository. The issue was titled [BUG] Propeller secrets webhook should respect service name and was resolved by a pull request: fix: Add servicename in certs. This suggests that the problem might have been fixed in a later version of Flyte. If you're not already using the latest version of Flyte, you might want to consider upgrading to see if that resolves the issue.

If you're already using the latest version of Flyte, or if upgrading doesn't resolve the issue, it might be due to the webhook service name not being correctly configured in the Helm values file. The service name is derived from the Helm helper function "flyte-binary.webhook.serviceName". If this value is not correctly set in the Helm values file, it could lead to the error you're experiencing.

Here is an example of how to correctly set the "flyte-binary.webhook.serviceName" value in your Helm values file:

flyte-binary:
  webhook:
    serviceName: my-webhook-service

Replace "my-webhook-service" with the name of your webhook service. After updating the Helm values file, you should redeploy the Helm chart for the changes to take effect.

The default value for the webhook service name in the flyte-binary chart is generated by appending "-webhook" to the fully qualified app name. This is done in the "flyte-binary.webhook.serviceName" definition in the _helpers.tpl file. If the service name isn't explicitly set in the Helm values file, this default value will be used.

If the issue persists after trying these solutions, please provide more context or error logs for further troubleshooting.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
Jeinhaus commented 5 months ago

@eapolinario did you make any progress on this? I don't actually think this is a helm issue, since the generated config looks fine :thinking:

Jeinhaus commented 5 months ago

This might've been caused by a previous installation of the chart that created a MutatingWebhookConfiguration that was never cleaned up. We had a similar issue when changing from the flyte-binary to the flyte-core chart. Deleting the old/wrong MutatingWebhookConfiguration fixed this issue for us.