flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.75k stars 654 forks source link

[Housekeeping] Enqueue in subqueue only on actionable Pod phase #2856

Open hamersaw opened 2 years ago

hamersaw commented 2 years ago

Describe the issue

Currently the FlytePropeller workqueue contains two levels. The first is where FlytePropeller pulls workflows from the periodically evaluate. The second is used as a batching queue to ensure fast downstream node evaluations while mitigating issues caused by high frequency execution. FlytePropeller adds workflows the the subqueue when k8s resource updates are observed.

This means that everytime a Pod phase is updated we re-evaluate the workflow, which may result is many unnecessary evaluations. For example, on Pod transition from "QUEUED" to "INITIALIZING" there is nothing that FlytePropeller can do. It may make sense to only only enqueue the workflow when k8s resources reach terminal phases (ex. "FAILED" or "SUCCEEDED") and FlytePropeller can actually reschedule or schedule downstream operations.

What if we do not do this?

The subqueue may be flooded with workflows and FlytePropeller will unnecessarily evaluate the workflow, incurring additional overhead.

Related component(s)

No response

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

P3rcy-8685 commented 2 years ago

I would like to work on this issue @samhita-alla @hamersaw

P3rcy-8685 commented 2 years ago

I would like to work on this issue @samhita-alla @hamersaw

I would love some guidance on this issue. I am familiar with the language but I am participating in open source projects for the first time

daniel-shuy commented 2 years ago

@hamersaw Looking through the code, I see that the WorkerPool is getting the workflow namespace and name from the subqueue (https://github.com/flyteorg/flytepropeller/blob/26ad85757c57cda41bf23a2b054d49eccaa8145d/pkg/controller/workers.go#L39-L91) and calling Propeller.Handle() to reconcile the workflow.

Correct me if I'm wrong, but this seems contrary to your explanation though, as Propeller.Handle() ignores terminated workflows (https://github.com/flyteorg/flytepropeller/blob/26ad85757c57cda41bf23a2b054d49eccaa8145d/pkg/controller/handler.go#L205-L211):

    if w.GetExecutionStatus().IsTerminated() {
        if HasCompletedLabel(w) && !HasFinalizer(w) {
            logger.Debugf(ctx, "Workflow is terminated.")
            // This workflow had previously completed, let us ignore it
            return nil
        }
    }
hamersaw commented 2 years ago

@daniel-shuy I think I may not have been clear enough in the description. So the workqueue retrieval that you linked to is for the top-level queue. When we create the queue we create a two-tiered structure. Basically, every time we want FlytePropeller to process a worklfow it is added to the top-level queue.

Items are added to the subqueue when k8s resource changes are detected. This means that every time a Pod phase changes (ex. Initialized -> Queued) the item is added to the subqueue. Then periodically, items from the subqueue are batched and added to the top-level queue. This scheme was designed because in large workflows Pod updates can happen very frequently and we shouldn't be evaluating a workflow every few milliseconds.

So this issue was designed to further minimize the number of times FlytePropeller needs to evaluate the workflow. Now that I review this, I think we need more than terminal Pod phases, because we still want to update node phase status' dynamically. But we only need to add the workflow to the subqueue if FlytePropeller can update the workflow using the Pod phase. So phase updates from Initializing -> Queued or Queued -> Running may not be useful. This is going to require some parsing through which Pod phase are actionable. Looking at the k8s Pod plugin may be a great place to start.

daniel-shuy commented 2 years ago

@hamersaw Thanks for the detailed explanation. Looks like this is even more complicated than I expected šŸ˜…

github-actions[bot] commented 6 months ago

Hello šŸ‘‹, this issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will engage on it to decide if it is still applicable. Thank you for your contribution and understanding! šŸ™

hamersaw commented 6 months ago

commenting to keep open.

peterxcli commented 1 week ago

@hamersaw Hi! I'm currently working on this issue and have a quick question. Is there a way to retrieve the previous phase of a task pod directly, or should I manually record it when K8s resource changes are detected?

Thanks for your guidance!