dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.09k stars 1.39k forks source link

Partitioned asset checks #17005

Open johannkm opened 11 months ago

johannkm commented 11 months ago

Enable checks per-partition, instead of just per-asset

johannkm commented 9 months ago

See https://github.com/dagster-io/dagster/discussions/17194 for an example of how to write a check on a partitioned asset currently.

the4thamigo-uk commented 7 months ago

Is there a plan on the roadmap to support this, or is it likely not to be supported in the near future?

johannkm commented 7 months ago

This is planned for the next few months

abhischekt commented 6 months ago

Hey @johannkm do you have an update on Partitioned Asset checks? We are trying to work around with: https://github.com/dagster-io/dagster/discussions/17194#discussioncomment-7275029 but unfortunately it's triggering checks for every partition on materialization of a single partition

johannkm commented 6 months ago

@abhischekt we've made some progress with UI designs etc. but we aren't actively working on it. I think it's likely to ship in a few months but not sooner. A hack you could experiment with: you can access context.partition_key from inside an @asset_check if it's a partitioned run. No guarantees with this but you could use it to only check the partition you're materializing in the same run.

MattOates commented 1 month ago

+1 to the voice requesting this. All of our assets are partitioned. Really some advisory on how to build out the checks would be a good idea too. The ideal model at the moment feels like you should be generating check metadata within the asset so its partitioned, and then the checks should just be doing something very simple over that metadata. Which is not how we planned to use them, instead the plan was a QA person has their own pocket to put their code outside of being in the asset logic.

jtavernier commented 1 month ago

+1. Most of our assets are partitioned, and the current solutions are far from ideal.

Either the check is triggered inefficiently when backfilling multiple partitions, or the check results get overwritten by the last partition executed.

astronautas commented 1 month ago

Bump. Most of our assets are partitioned as well, in particular, ML models scored on regular frequency. Until this is fixed, we will be defining custom downstream assets, which would be expected to crash if quality checks are not met for a certain partition, blocking further downstream flow. But it would be great if such checks could be defined as part of assets being checked, as post-conditions for successfull materialization ;).