When you have partitioned assets depending on non-partitioned assets executing on the same AutoMaterialise frequency, both assets will be triggered at the same time but as different jobs. This causes the partitioned assets to not respect the dependency chain, and materialise before their upstream assets are fresh.
eg. A (non-partitioned) -> B (partitioned) with Cron rule to trigger hourly, A and B will be evaluated as needing refresh at the same time as expected, however 2 jobs get created and B will materialise before A has refreshed
When configuring a job (instead of using AutoMaterialise) with these 2 assets, they will get orchestrated as 2 sub-tasks within the same job with the dependency recognised, forcing B to run after A (if A succeeded)
What did you expect to happen?
Partitioned and non-partitioned assets to be scheduled within the same job as separate sub-tasks, respecting the dependencies in the materialisation sequence.
eg. A (non-partitioned) -> B (partitioned) with Cron rule to trigger hourly, A and B will be evaluated as needing refresh at the same time, a single job including both assets will be triggered with 2 sub-tasks, where A runs first, and B runs on A succeeding
How to reproduce?
Create 2 assets A and B, with AutoMaterializeRule.materialize_on_cron to the same frequency.
B should be depedent on A
Turn on Auto-materialise
Notice both assets get scheduled at the same time as separate jobs
Notice B can start and/or finish materialising before A (and therefore materialise on incomplete / missing data)
Now disable auto-materialise, configure a job and schedule including assets A and B - notice A and B get triggered as a single job, 2 sub-tasks, and B won't run until A has succeeded.
Deployment type
Dagster Helm chart
Deployment details
Verified in local and Helm deployments
Additional information
No response
Message from the maintainers
Impacted by this issue? Give it a 👍! We factor engagement into prioritization.
I believe I am experiencing the same issue (https://github.com/dagster-io/dagster/issues/19357) though AutoMaterialize does not affect whether or not the asset dependency chain is followed in my case.
Dagster version
1.5.13
What's the issue?
When you have partitioned assets depending on non-partitioned assets executing on the same AutoMaterialise frequency, both assets will be triggered at the same time but as different jobs. This causes the partitioned assets to not respect the dependency chain, and materialise before their upstream assets are fresh.
eg. A (non-partitioned) -> B (partitioned) with Cron rule to trigger hourly, A and B will be evaluated as needing refresh at the same time as expected, however 2 jobs get created and B will materialise before A has refreshed
When configuring a job (instead of using AutoMaterialise) with these 2 assets, they will get orchestrated as 2 sub-tasks within the same job with the dependency recognised, forcing B to run after A (if A succeeded)
What did you expect to happen?
Partitioned and non-partitioned assets to be scheduled within the same job as separate sub-tasks, respecting the dependencies in the materialisation sequence.
eg. A (non-partitioned) -> B (partitioned) with Cron rule to trigger hourly, A and B will be evaluated as needing refresh at the same time, a single job including both assets will be triggered with 2 sub-tasks, where A runs first, and B runs on A succeeding
How to reproduce?
AutoMaterializeRule.materialize_on_cron
to the same frequency.Now disable auto-materialise, configure a job and schedule including assets A and B - notice A and B get triggered as a single job, 2 sub-tasks, and B won't run until A has succeeded.
Deployment type
Dagster Helm chart
Deployment details
Verified in local and Helm deployments
Additional information
No response
Message from the maintainers
Impacted by this issue? Give it a 👍! We factor engagement into prioritization.