dagster-io / hooli-data-eng-pipelines

Example Dagster Cloud code for the Hooli Data Engineering organization.
72 stars 15 forks source link

update dagster asset check to fail"better" #65

Closed cnolanminich closed 5 months ago

cnolanminich commented 6 months ago

This PR updates the asset check on the RAW_DATA.users asset in a few ways

image

Will pull this out of draft mode once we get some consensus on the metadata output for this

github-actions[bot] commented 6 months ago

Your pull request is automatically being deployed to Dagster Cloud.

Location Status Link Updated
demo_assets View in Cloud Mar 11, 2024 at 06:22 PM (UTC)
snowflake_insights View in Cloud Mar 11, 2024 at 06:22 PM (UTC)
basics View in Cloud Mar 11, 2024 at 06:22 PM (UTC)
data-eng-pipeline View in Cloud Mar 11, 2024 at 06:22 PM (UTC)
batch_enrichment View in Cloud Mar 11, 2024 at 06:22 PM (UTC)
slopp commented 6 months ago

Yea I like the idea of using markdown output metadata in an asset check, but I think this particular data frame is fairly confusing because what you are presenting isn't really a dataframe, the rows aren't meaningful across columns.

I think of this data check more as looking for bad company data, not necessarily checking that every expected company was observed. So maybe we could simplify it as:

MetadataValue.md(f"Observed the following unexpected companies: {list(unique_companies - expected_companies)}") 
cnolanminich commented 6 months ago

Oh yeah I agree! I'll push that change and then this one will be in good shape

cnolanminich commented 5 months ago

Fixed!

image