dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.69k stars 1.48k forks source link

AutoMaterialise splits partitioned and non-partitioned asset into separate jobs #20397

Open jPinhao-Rover opened 8 months ago

jPinhao-Rover commented 8 months ago

Dagster version

1.5.13

What's the issue?

When you have partitioned assets depending on non-partitioned assets executing on the same AutoMaterialise frequency, both assets will be triggered at the same time but as different jobs. This causes the partitioned assets to not respect the dependency chain, and materialise before their upstream assets are fresh.

eg. A (non-partitioned) -> B (partitioned) with Cron rule to trigger hourly, A and B will be evaluated as needing refresh at the same time as expected, however 2 jobs get created and B will materialise before A has refreshed

When configuring a job (instead of using AutoMaterialise) with these 2 assets, they will get orchestrated as 2 sub-tasks within the same job with the dependency recognised, forcing B to run after A (if A succeeded)

What did you expect to happen?

Partitioned and non-partitioned assets to be scheduled within the same job as separate sub-tasks, respecting the dependencies in the materialisation sequence.

eg. A (non-partitioned) -> B (partitioned) with Cron rule to trigger hourly, A and B will be evaluated as needing refresh at the same time, a single job including both assets will be triggered with 2 sub-tasks, where A runs first, and B runs on A succeeding

How to reproduce?

Now disable auto-materialise, configure a job and schedule including assets A and B - notice A and B get triggered as a single job, 2 sub-tasks, and B won't run until A has succeeded.

Deployment type

Dagster Helm chart

Deployment details

Verified in local and Helm deployments

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

ryandstoughton commented 7 months ago

I believe I am experiencing the same issue (https://github.com/dagster-io/dagster/issues/19357) though AutoMaterialize does not affect whether or not the asset dependency chain is followed in my case.