dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.7k stars 1.48k forks source link

Auto-materialization daemon re-runs all partitions if time window is extended to the past #21584

Open psarka opened 6 months ago

psarka commented 6 months ago

Dagster version

1.7.2

What's the issue?

If you have an asset starting on date D, automaterialize it, then change D to D-1, auto-materialization daemon will rerun all the partitions.

What did you expect to happen?

I expect it to only run the new partition

How to reproduce?

I can create a repro if needed.

Deployment type

None

Deployment details

No response

Additional information

I know why it happens. MaterializeOnMissingRule has a method get_handled_subset. That method returns three subsets "or"ed together. When extending partition, one of them starts having different parititions_def, and "or" method in such case returns empty subset. Like that:

context.materialized_since_previous_tick_subset=ValidAssetSubset(asset_key=AssetKey(['abc123', 'daily_run']), value=PartitionKeysTimeWindowPartitionsSubset([]))
Daily, starting 2021-01-01 Etc/GMT-12.

context.previous_tick_requested_subset=ValidAssetSubset(asset_key=AssetKey(['abc123', 'daily_run']), value=TimeWindowPartitionsSubset([]))
Daily, starting 2021-01-02 Etc/GMT-12.

previous_handled_subset=AssetSubset(asset_key=AssetKey(['abc123', 'daily_run']), value=TimeWindowPartitionsSubset([PartitionKeyRange(start='2021-01-02', end='2021-01-03')]))
Daily, starting 2021-01-02 Etc/GMT-12.

# And the result of | is empty:
res=ValidAssetSubset(asset_key=AssetKey(['abc123', 'daily_run']), value=PartitionKeysTimeWindowPartitionsSubset([]))

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

jamiedemaria commented 6 months ago

@OwenKephart assigning this issue to you, but i can also take a look and put up a PR if it doesn't look too complicated to fix

OwenKephart commented 6 months ago

Hi @psarka, thanks for tracking down the source of this, and sorry you ran into this. We're in the middle of a large set of changes to the core system, which as a byproduct should ideally make bugs like this harder to accidentally happen. We can get a fix for this behavior into one of the next two releases.