flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.18k stars 550 forks source link

[Core feature] Default task resource behavior should apply for node level overrides #5414

Open katrogan opened 1 month ago

katrogan commented 1 month ago

Motivation: Why do you think this is important?

The flyte docs https://docs.flyte.org/en/latest/user_guide/productionizing/customizing_task_resources.html#customizing-task-resources specify how to configure task resources in the decorator and using with_overrides syntax (https://docs.flyte.org/en/latest/user_guide/productionizing/customizing_task_resources.html#using-with-overrides)

however, the back-end behavior, documented here: https://docs.flyte.org/en/latest/deployment/configuration/customizable_resources.html#task-resources

clarifies that

Default values get injected as the task requests and limits when a task definition omits a specific resource.

This is not the case for overrides, confusingly

Goal: What should the final outcome look like, ideally?

Both node level overrides and task resources specified in the decorator should have the same treatment and validation in the back-end

Describe alternatives you've considered

Current implementation. It works but isn't intuitive

Propose: Link/Inline OR Additional context

No response

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

runllm[bot] commented 1 month ago

The issue you're describing highlights a discrepancy between the Flyte documentation and the actual behavior of task resource overrides. Specifically, the default task resource values are not being applied consistently when using node-level overrides.

Would you like more detail on how to address this issue or further explanation on the current behavior and potential fixes?

Sources:


This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not.

Mention @runllm to ask follow-up questions. See runllm.com for more!

eapolinario commented 1 month ago

Let's add a code example to clarify the expected behavior.

cosmicBboy commented 1 month ago

Suppose a user defines a task @task(requests=Resources(cpu="2", mem="500Mi"))

When a user is paying for compute (e.g. in serverless), we don't want to unexpectedly provision more compute or memory than they asked for. While this is what limits is for, it requires the user to reason about the difference between requests and limits.

The desired behavior is that the user gets exactly what they ask for in the@task definition or when using with_overrides in the context of a workflow.

Running these functional tests without the limits argument in with_overrides led to more memory being provisioned that requested for.