Closed sryza closed 1 year ago
Yeah I had the same issue! The quick fix for me is to disable automatic retries through Dagster Cloud UI.
I was finally able to reproduce this. Minimal steps:
It appears that source of the problem is that the execution plan for the retry ends up containing all steps, instead of just the step that should be retried.
Interestingly, the same error does not occur when launching a manual re-execution from failure.
Here's what gets passed to create_run
in the auto-retry case (which is where the problem is):
asset_selection: frozenset({AssetKey(['multi_dynamic_downstream2'])})
solid_selection: None
execution_plan_snapshot.steps: ['multi_dynamic_downstream2', 'multi_dynamic_upstream2', 'non_partitioned_asset']
pipeline_snapshot.node_names: ['multi_dynamic_downstream2', 'multi_dynamic_upstream2', 'non_partitioned_asset']
Here's what gets passed to create_run
in the manual retry case (which works correctly):
asset_selection: frozenset({AssetKey(['multi_dynamic_downstream2'])})
solid_selection: None
execution_plan_snapshot.steps: ['multi_dynamic_downstream2']
pipeline_snapshot.node_names: ['multi_dynamic_downstream2']
"It appears that source of the problem is that the execution plan for the retry ends up containing all steps, instead of just the step that should be retried."
I totally agree with that! Thanks @sryza for digging in!
When an automatic run retry is launched for a run launched by the asset reconciliation sensor, sometimes it hits an error like this:
Reported twice: