flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.69k stars 639 forks source link

[Feature] Support serializing Scheduled Executions #420

Open EngHabu opened 4 years ago

EngHabu commented 4 years ago

Motivation: Why do you think this is important? With a scheduled executions, users might expect no more than one execution will be running at the same time (assuming an older run takes a long time to finish).

Goal: What should the final outcome look like, ideally? Users can signify exactly once flag when creating a schedule for a launch plan.

Flyte component

kumare3 commented 4 years ago

Exactly once means that something will be done exactly once, not twice and not zero times. @EngHabu Do you rather mean that only one scheduled execution runs at a time effectively serial iz Ing the schedules?

EngHabu commented 4 years ago

Yes exactly that

kumare3 commented 2 years ago

this should dedupe - https://github.com/flyteorg/flyte/issues/267

kumare3 commented 2 years ago

Cc @EngHabu this should not be closed

github-actions[bot] commented 1 year ago

Hello 👋, This issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will close the issue if we detect no activity in the next 7 days. Thank you for your contribution and understanding! 🙏

github-actions[bot] commented 1 year ago

Hello 👋, This issue has been inactive for over 9 months and hasn't received any updates since it was marked as stale. We'll be closing this issue for now, but if you believe this issue is still relevant, please feel free to reopen it. Thank you for your contribution and understanding! 🙏

kdhingra307 commented 2 months ago

@kumare3 any possibility of making this happen?

kumare3 commented 2 months ago

This is on the roadmap, cannot be prioritized right now as short staffed. Please contribute!

kumare3 commented 2 months ago

Cc @EngHabu @eapolinario fyi

kumare3 commented 2 months ago

@kdhingra307 please share a motivating example / usecase

kdhingra307 commented 2 months ago

@kumare3 One use-case could involve processing a batch of data from a stream. For example, if we are at index 10 and want to process the next 500 samples, we increase the counter once the workflow is complete. However, if the workflow takes longer to finish than the frequency at which new workflows are launched, another workflow might start and process the same data.

Also, i can try working on this

kdhingra307 commented 2 months ago

@kumare3 i feel the logic to implement skip feature would be relatively easier. We can add a check in PrepareFlyteWorkflow function and check if a copy of same workflow is running or not. I did a small poc locally and it seems to be working fine.


otpt, _ := w.adminServiceClient.ListExecutions(context.Background(), &admin.ResourceListRequest{
    Id:      &admin.NamedEntityIdentifier{Project: "flytesnacks", Domain: "development"},
    Filters: "eq(launch_plan.name,my_cron_scheduled_lp)+value_in(phase,RUNNING)",
                Limit:   1,
})
if otpt.Token == "" {
    return nil
} else {
    logger.Errorf(ctx, "it already exists so skipping")
    return nil
}

Making the workflow wait for the previous one is bit tricky, like i feel it can be done by adding a condition in handleReadyWorkflow to keep in Ready phase until the previous workflow is finished but i could not find a way to update the status in flyteconsole like it was stuck on Unknown state. One way to handle this would be by creating a new Phase altogether?? Also i feel with this change, we can add fix this issue too but i will need some guidance as i am super new to flyte codebase.

eapolinario commented 2 months ago

@kdhingra307 , take a look at https://github.com/flyteorg/flyte/pull/5659 where we discussing this idea.