dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.98k stars 1.5k forks source link

Partitioned asset checks #17005

Open johannkm opened 1 year ago

johannkm commented 1 year ago

Enable checks per-partition, instead of just per-asset

johannkm commented 1 year ago

See https://github.com/dagster-io/dagster/discussions/17194 for an example of how to write a check on a partitioned asset currently.

the4thamigo-uk commented 10 months ago

Is there a plan on the roadmap to support this, or is it likely not to be supported in the near future?

johannkm commented 10 months ago

This is planned for the next few months

abhischekt commented 9 months ago

Hey @johannkm do you have an update on Partitioned Asset checks? We are trying to work around with: https://github.com/dagster-io/dagster/discussions/17194#discussioncomment-7275029 but unfortunately it's triggering checks for every partition on materialization of a single partition

johannkm commented 9 months ago

@abhischekt we've made some progress with UI designs etc. but we aren't actively working on it. I think it's likely to ship in a few months but not sooner. A hack you could experiment with: you can access context.partition_key from inside an @asset_check if it's a partitioned run. No guarantees with this but you could use it to only check the partition you're materializing in the same run.

MattOates commented 4 months ago

+1 to the voice requesting this. All of our assets are partitioned. Really some advisory on how to build out the checks would be a good idea too. The ideal model at the moment feels like you should be generating check metadata within the asset so its partitioned, and then the checks should just be doing something very simple over that metadata. Which is not how we planned to use them, instead the plan was a QA person has their own pocket to put their code outside of being in the asset logic.

jtavernier commented 4 months ago

+1. Most of our assets are partitioned, and the current solutions are far from ideal.

Either the check is triggered inefficiently when backfilling multiple partitions, or the check results get overwritten by the last partition executed.

astronautas commented 4 months ago

Bump. Most of our assets are partitioned as well, in particular, ML models scored on regular frequency. Until this is fixed, we will be defining custom downstream assets, which would be expected to crash if quality checks are not met for a certain partition, blocking further downstream flow. But it would be great if such checks could be defined as part of assets being checked, as post-conditions for successfull materialization ;).

gianfrancodemarco commented 2 months ago

Digging in the AssetCheckExecutionContext I've found that the partition key is available. So one can execute the checks only for the relevant partition:

@asset_check(asset=my_asset)
def my_check(context: AssetCheckExecutionContext, partition_data: dict[str, MyDataType]) -> AssetCheckResult:

    partition_key = context.run.tags["dagster/partition"]
    my_data = partition_data[partition_key]

    return AssetCheckResult(
        passed=some_function(my_data)
    )

Obviously this is a temporary solution and Dagster should absolutely have support for checks on partitioned assets!

AdamSkarboJonsson commented 1 month ago

Bump! This feature would be a game changer for us.

AlexandraDumitriu2 commented 6 days ago

Really looking forward to this feature!