When initiating a backfill for multiple assets, where all of them have single run backfill policy, then the downstream assets' partitions do not get materialized in a single run, but rather in multiple runs with random chunks.
What did you expect to happen?
All assets's partitions get materialized in a single run, as specified by their backfill policy, regardless of their partitions definitions start time differences.
Two chains here - one chain has different start times for partitions_def, the other chain has identical start times.
1) Select the root_buggy asset, and in the lineage view, click "Downstream" and then "Materialize all..."
2) In my example, for the period to backfill, choose [2024-01-01-00:00...2024-01-31-23]. The "Backfill preview" view should look like this:
3) Once the backfill starts, it will at first backfill all the root_buggy partitions in a single run, and after that the downstream_buggy partitions are materialized in multiple runs, where the time period is random. I have seen anywhere from 3 hours to 15 days-worth to all partitions in a single run. It looks like this (there are more runs of course, did not fit all of them in the picture):
4) For comparison, start a backfill in the exact same way with the same partitions range for root_working and downstream.
5) Once the backfill starts, it will backfill both assets's partitions in a single run, as expected:
Deployment type
Dagster Helm chart
Deployment details
No response
Additional information
No response
Message from the maintainers
Impacted by this issue? Give it a 👍! We factor engagement into prioritization.
Dagster version
1.6.2
What's the issue?
When initiating a backfill for multiple assets, where all of them have single run backfill policy, then the downstream assets' partitions do not get materialized in a single run, but rather in multiple runs with random chunks.
What did you expect to happen?
All assets's partitions get materialized in a single run, as specified by their backfill policy, regardless of their partitions definitions start time differences.
How to reproduce?
To reproduce in local dev env:
Two chains here - one chain has different start times for partitions_def, the other chain has identical start times.
1) Select the
root_buggy
asset, and in the lineage view, click "Downstream" and then "Materialize all..." 2) In my example, for the period to backfill, choose[2024-01-01-00:00...2024-01-31-23]
. The "Backfill preview" view should look like this: 3) Once the backfill starts, it will at first backfill all theroot_buggy
partitions in a single run, and after that thedownstream_buggy
partitions are materialized in multiple runs, where the time period is random. I have seen anywhere from 3 hours to 15 days-worth to all partitions in a single run. It looks like this (there are more runs of course, did not fit all of them in the picture): 4) For comparison, start a backfill in the exact same way with the same partitions range forroot_working
and downstream. 5) Once the backfill starts, it will backfill both assets's partitions in a single run, as expected:Deployment type
Dagster Helm chart
Deployment details
No response
Additional information
No response
Message from the maintainers
Impacted by this issue? Give it a 👍! We factor engagement into prioritization.