Open bmarcj opened 1 year ago
cc @bengotow @clairelin135
There's a proposal here that would address this: https://github.com/dagster-io/dagster/discussions/14829.
I believe @ruizh22 is planning to work on this.
That looks a good proposal. It would be great to have cleaner separation of partitioning of the data (a data modelling decision) versus computation across the data (driven by practical considerations like memory, CPU, cost...). Currently it is not separated, and this is why the single/multi run is even an issue.
What's the use case?
Currently any partitioned assets can be run as single runs, or as multiple runs. The latter has one partition per run, while the former combines multiple assets into a single run. The former is useful when individual assets are small, or work is offloaded to something like Snowflake and partitioning serves no purpose.
Whether or not this is supported depends on the exact details of the rest of the code: https://docs.dagster.io/concepts/partitions-schedules-sensors/backfills#single-run-backfills
It's not clear what the consequences are if the code does use
partition_key
,asset-partition_key
,asset_partition_key_for_output
orasset_partition_key_for_input
instead of window/range equivalents. In the best case, presumably an error. In the worst case, the job might silently succeed while producing unintended behaviour.There is no way to disable this functionality, and so users are left relying on what they can guess or remember about how individual assets or computed, loaded and persisted. Meanwhile the developer has no easy way of communicating the intended or optimal choice.
The best workaround right now is for IO managers and
@asset
ops to explicitly check the context only involves a single partition to prevent single-runs across multiple partitions.It would be a good idea if developers could include this information so that the GUI (or CLI) can forbid or allow only particular ways of backfilling.
Ideas of implementation
Adding an enumeration to the
@asset
annotation for developers to explicitly allow, or disable, single or multi-run back-fills, that the GUI can present to users.Additional information
No response
Message from the maintainers
Impacted by this issue? Give it a 👍! We factor engagement into prioritization.