flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.47k stars 584 forks source link

[BUG] Flytepropeller bug for multicluster operation #4020

Open samuel-sujith opened 1 year ago

samuel-sujith commented 1 year ago

Describe the bug

I am running Flyte in multicluster mode and I am getting the below error intermittently.

I followed the instructions here https://docs.flyte.org/en/latest/deployment/deployment/multicluster.html

Operation cannot be fulfilled on flyteworkflows.flyte.lyft.com "a52mgxmqsx682x9kkbzr": the object has been modified; please apply your changes to the latest version and try again When i submit the job and it goes to the remote cluster, after one or 2 nodes are executed, I get this error in the remote data plane cluster flytepropeller. I think this means that the original workflow has been edited and since that version doesnt exist in the remote, it errors out and the node execution aborts.

The error is coming from flytepropeller workflowstore/passthrough.go in the line number 104 logger.Errorf(ctx, "Failed to update workflow. Error [%v]", err)

But if i relaunch the execution, it generally goes fine without any issues. Is this a known issue with multicluster operation

Expected behavior

When i execute multicluster operation, the workflow should not randomly fail

Additional context to reproduce

https://docs.flyte.org/en/latest/deployment/deployment/multicluster.html

Screenshots

No response

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

samuel-sujith commented 1 year ago
{"json":{"src":"controller.go:157"},"level":"info","msg":"==\u003e Enqueueing workflow [samuel-predictive-azure-development/ahkz2dzvcvqvzrgh85pv]","ts":"2023-09-09T04:46:18Z"}
{"json":{"exec_id":"ahkz2dzvcvqvzrgh85pv","ns":"samuel-predictive-azure-development","res_ver":"26307647","routine":"worker-1","src":"executor.go:390","wf":"samuel-predictive-azure:development:predictive_azure.predictive_handover_workflow.predictive_handover_workflow"},"level":"info","msg":"Handling Workflow [ahkz2dzvcvqvzrgh85pv] Done","ts":"2023-09-09T04:46:19Z"}
{"json":{"exec_id":"ahkz2dzvcvqvzrgh85pv","ns":"samuel-predictive-azure-development","routine":"worker-1","src":"passthrough.go:80"},"level":"debug","msg":"Observed FlyteWorkflow Update (maybe finalizer)","ts":"2023-09-09T04:46:19Z"}
{"json":{"src":"event.go:282"},"level":"info","msg":"Event(v1.ObjectReference{Kind:\"FlyteWorkflow\", Namespace:\"samuel-predictive-azure-development\", Name:\"ahkz2dzvcvqvzrgh85pv\", UID:\"74ec839e-c936-4e77-a544-c558ab2b534a\", APIVersion:\"flyte.lyft.com/v1alpha1\", ResourceVersion:\"26307647\", FieldPath:\"\"}): type: 'Normal' reason: 'Running' Workflow began execution","ts":"2023-09-09T04:46:19Z"}
{"json":{"exec_id":"ahkz2dzvcvqvzrgh85pv","ns":"samuel-predictive-azure-development","routine":"worker-1","src":"passthrough.go:95"},"level":"error","msg":"Failed to update workflow. Error [Operation cannot be fulfilled on flyteworkflows.flyte.lyft.com \"ahkz2dzvcvqvzrgh85pv\": the object has been modified; please apply your changes to the latest version and try again]","ts":"2023-09-09T04:46:19Z"}
{"json":{"exec_id":"ahkz2dzvcvqvzrgh85pv","ns":"samuel-predictive-azure-development","routine":"worker-1","src":"handler.go:299"},"level":"info","msg":"Completed processing workflow.","ts":"2023-09-09T04:46:19Z"}
E0909 04:46:19.048181       1 workers.go:102] error syncing 'samuel-predictive-azure-development/ahkz2dzvcvqvzrgh85pv': Operation cannot be fulfilled on flyteworkflows.flyte.lyft.com "ahkz2dzvcvqvzrgh85pv": the object has been modified; please apply your changes to the latest version and try again

I enabled debug and this is what comes in

github-actions[bot] commented 3 months ago

Hello 👋, this issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will engage on it to decide if it is still applicable. Thank you for your contribution and understanding! 🙏