flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.75k stars 652 forks source link

[BUG] Child workflows from aborted parents reporting UNDEFINED phase #1733

Open katrogan opened 3 years ago

katrogan commented 3 years ago

Describe the bug

User reports:

verything from our system that has no phase looks like this

{"executions":[{"id":{"project":"sp-one-model","domain":"production","name":"fbc3v1iy"},"spec":{"launch_plan":{"resource_type":"LAUNCH_PLAN","project":"sp-one-model","domain":"production","name":"workflows.onemodel.premium_revenue.workflow.PremiumRevenueForecast_lp","version":"dfbdc72bb88978e3f2b8d0131445aa30f271dcb9"},"metadata":{"mode":"CHILD_WORKFLOW","principal":"unknown","nesting":1,"parent_node_execution":{"node_id":"premium-revenue-forecast","execution_id":{"project":"sp-one-model","domain":"production","name":"mo2b05rnm2"}},"system_metadata":{"execution_cluster":"flyte-production-regional"}}},"closure":{"abort_metadata":{"cause":"cascading abort as parent execution id [mo2b05rnm2] aborted, reason [Some node execution failed, auto-abort.]"},"created_at":"2021-04-12T20:50:18.120569115Z","updated_at":"2021-04-12T20:50:18.120569115Z","workflow_id":{"resource_type":"WORKFLOW","project":"sp-one-model","domain":"production","name":"workflows.onemodel.premium_revenue.workflow.PremiumRevenueForecast","version":"dfbdc72bb88978e3f2b8d0131445aa30f271dcb9"}}}],"token":"1"}
with the common theme being they all seem to be due to a manual abort of parent execution ( "abort_metadata":{"cause":"cascading abort as parent execution id [mo2b05rnm2] aborted, reason [Some node execution failed, auto-abort.] ) 

Expected behavior

Aborting a parent should result in a child workflow execution phase showing as ABORTED as well

Additional context to reproduce

slack thread: https://flyte-org.slack.com/archives/CNMKCU6FR/p1634850738011500?thread_ts=1634845045.008600&cid=CNMKCU6FR

Screenshots

No response

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

katrogan commented 3 years ago

cc @pmahindrakar-oss since you've looked into related issues before

katrogan commented 3 years ago

observed with: flytepropeller v0.14.6 flyteadmin v0.6.33

kumare3 commented 2 years ago

@katrogan I do not follow this issue. Can you please elaborate

katrogan commented 2 years ago

@kumare3 see the linked thread:

awesome thanks! tangential question, what results in an UNDEFINED phase? Everything from our system that has no phase looks like this

{"executions":[{"id":{"project":"X","domain":"production","name":"fbc3v1iy"},"spec":{"launch_plan":{"resource_type":"LAUNCH_PLAN","project":"sp-one-model","domain":"production","name":"X","version":"dfbdc72bb88978e3f2b8d0131445aa30f271dcb9"},"metadata":{"mode":"CHILD_WORKFLOW","principal":"unknown","nesting":1,"parent_node_execution":{"node_id":"N","execution_id":{"project":"X","domain":"production","name":"mo2b05rnm2"}},"system_metadata":{"execution_cluster":"flyte-production"}}},"closure":{"abort_metadata":{"cause":"cascading abort as parent execution id [mo2b05rnm2] aborted, reason [Some node execution failed, auto-abort.]"},"created_at":"2021-04-12T20:50:18.120569115Z","updated_at":"2021-04-12T20:50:18.120569115Z","workflow_id":{"resource_type":"WORKFLOW","project":"X","domain":"production","name":"X","version":"dfbdc72bb88978e3f2b8d0131445aa30f271dcb9"}}}],"token":"1"}
with the common theme being they all seem to be due to a manual abort of parent execution ( "abort_metadata":{"cause":"cascading abort as parent execution id [mo2b05rnm2] aborted, reason [Some node execution failed, auto-abort.] ) 

Is there anything else that can result in an UNDEFINED phase?

the abort cause with the cascading abort is recorded but the child workflow phase is UNDEFINED

github-actions[bot] commented 1 year ago

Hello πŸ‘‹, This issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will close the issue if we detect no activity in the next 7 days. Thank you for your contribution and understanding! πŸ™

github-actions[bot] commented 1 year ago

Hello πŸ‘‹, This issue has been inactive for over 9 months and hasn't received any updates since it was marked as stale. We'll be closing this issue for now, but if you believe this issue is still relevant, please feel free to reopen it. Thank you for your contribution and understanding! πŸ™

github-actions[bot] commented 3 months ago

Hello πŸ‘‹, this issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will engage on it to decide if it is still applicable. Thank you for your contribution and understanding! πŸ™