Open kumare3 opened 3 years ago
This issue should get a spec-first.
@kumare3 I am interested to know if there has been any progress on this?
I think that if Flyte manages to break the "each task is a container" idea, it will really differentiate Flyte from other workflow orchestration tools (which have all gone fairly hard towards making each task a container in recent years, with the possible exception of GitHub Workflows, which run every step/task within each job in the same worker).
@thesuperzapper sadly no progress yet. We do have a design but no rfc. Let's hope next year. Definitely join slack and let's have a chat. There are containerless backed plugins already in Flyte- check that out
@kumare3 I'm curious to know if there is any update on this ticket? which is very important for our current project
@mahanh we have this working in prototye. please ping me on slack, we would love to understand more.
To describe how we would use this feature:
We often have 20+ tasks in one sub workflow that all either take like 10-15 minutes or 1 second (if skipped) and they are really just running a single executable (but need to run it for the skipping check logic exists in C#). Coalescing would speed up the skip version of events a ton, since starting a single process and exiting is miles faster than getting k8s to schedule a new pod.
We might be able to use some FlyteAgent with long running "Sync" tasks or something (seems like we could just do sp.POpen
or something and block), but that seems like an over engineered solution for what should probably be a platform feature. Essentially a reusable "object pool" but for pods.
Also we often need small "bash-style" scripts to quickly move a file, rename one, grab a small request stuff like that. Which only really requires plain python anyway, those would also be great to be able to coalesce.
@kumare3 We are looking for a dagster alternative and this is a blocker for us. Our workload today, in flyte terminology, is a worflow, with 10 tasks, all fairly simple, and are calling other services we have, with HTTP. We sometimes have 50K runs concurrently. This would come to 500K pods :/ In dagster they have 1 per workflow to it's managable.
What is the status for this, or, is there an alternative?
This is available in union today, in Flyte you can do this using agents (agents are long running services) one pod for many many tasks - which are like api calls. In union we can automatically combine tasks onto one container
If it's just api calls you can use agents, happy to jump on a call. Go to slack.flyte.org and ping me. For union https://docs.union.ai/byoc/core-concepts/actors
Motivation: Why do you think this is important? FlytePropeller schedules each task as a new container instantiation today. All tasks are not alike, but for tasks like - simple container tasks, it is possible to run subsequent container steps on the same node, and this avoid the penalty of storing and hydrating data again using a backend store. This should be available as an optimization without any code changes for the user and the execution graph should still look the same.
Goal: What should the final outcome look like, ideally? As a user when I write the graph, I do not think about writing data to object store and then reading it back. I just expect that intermediate datasets (between subsequent steps) is stored durably. As a platform owner I would like to avoid the round-tripping of data between subsequent steps. This will improve the performance greatly and would reduce the number of pods scheduled on K8s.
Describe alternatives you've considered The upcoming feature of intra-task checkpointing should make it possible for users to create intra task checkpoints, which would effectively resolve the above problem. But in this scenario, the onus of managing the state and resuming falls on the user. It would be ideal if this is abstracted from the user such that, it is simply handled by the platform.
Flyte component
Additional context this has the potential of greatly speeding up the performance of many linear and simple dags and make writing multi-step workflows - performant and thus more desirable for the users.
Is this a blocker for you to adopt Flyte NA