flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.17k stars 550 forks source link

[BUG] Tasks from subworkflow calling reference launch plan read cache from different projects #5466

Open ysysys3074 opened 2 weeks ago

ysysys3074 commented 2 weeks ago

Describe the bug

In sub-workflow calling a reference launch plan, the tasks will read cache from a different project. 1) the project for current subworkflow(calling a reference launch plan) is yushang-test5

Screenshot 2024-06-11 at 12 02 39 AM

2) but the task from the reference launch plan read from cache, but the source execution of this cache is running on yushang-test4 project.

Screenshot 2024-06-11 at 12 03 21 AM

I did some investigation and found the cache key for the execution is actually attached to reference launch plan's identifier - lp's project/version/domain/name, so all the subworkflow calling the same launch plan will have same cache key.

Expected behavior

Cache should only work in the same project, different projects should not share the cache results which means we should not use lp's identifier as cache key but fetch identifier from current task, otherwise if our own tasks might read results from other people's execution. But i'm more curious what is the rationale of current design, to understand if this design is making sense.

Additional context to reproduce

No response

Screenshots

No response

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

runllm[bot] commented 2 weeks ago

The issue you're describing seems to be related to how Flyte handles caching across different projects when using reference launch plans. According to the Flyte documentation on remote caching, cache keys are composed of the Project, Domain, Cache Version, Task Signature, and Task Input Values. This design ensures that task executions are isolated by project and domain to prevent data corruption.

However, your observation indicates that the cache key is being derived from the reference launch plan's identifier, which includes the project, version, domain, and name of the launch plan. This could lead to cache hits across different projects, which is not the intended behavior.

Expected Behavior

Cache keys should be specific to the project and domain of the current task execution, not the reference launch plan. This would ensure that tasks do not inadvertently read cached results from executions in other projects.

Rationale

The current design might be intended to facilitate reusability and efficiency, but it compromises the isolation between projects. Changing the cache key derivation to include the current task's project and domain would align with the documented behavior and ensure data integrity.

Would you like more details on how to potentially address this issue or any specific part of the caching mechanism?

Sources:


This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not.

Mention @runllm to ask follow-up questions. See runllm.com for more!

eapolinario commented 2 weeks ago

@ysysys3074 , this is working as intended. This has deep implications about how Flyte entities (e.g. tasks and launch plans) are registered in the backend and how we refer to them. In the specific case of reference entities, they are tied to their original project+domain+name+version.