dagster-io / hooli-data-eng-pipelines

Example Dagster Cloud code for the Hooli Data Engineering organization.
72 stars 15 forks source link

Use dbt package for tests and reload manifest #66

Closed cnolanminich closed 5 months ago

cnolanminich commented 6 months ago

This PR does 2 things related to our use of dbt in the hooli project:

One note: I tried to use the new experimental DbtArtifacts class from 1.6.9 but it did not work -- it looks like it will need to expose all options of the DbtCliResource (right now I don't see a good way to set the profiles-dir argument in that class). Will post about it and see if that's expected.

The code that didn't work looked like this:

dbt_artifacts = DbtArtifacts(
    project_dir=DBT_PROJECT_DIR,
    #profiles_dir=DBT_PROFILES_DIR, #does not work is not a positional arg
    #target="BRANCH",  # does not work / is not a positional arg
    prepare_command=["--quiet","parse","--target BRANCH"],
)
DBT_MANIFEST = dbt_artifacts.manifest_path
github-actions[bot] commented 6 months ago

Your pull request is automatically being deployed to Dagster Cloud.

Location Status Link Updated
demo_assets View in Cloud Mar 19, 2024 at 03:44 PM (UTC)
snowflake_insights View in Cloud Mar 19, 2024 at 03:44 PM (UTC)
basics View in Cloud Mar 19, 2024 at 03:44 PM (UTC)
data-eng-pipeline View in Cloud Mar 19, 2024 at 03:44 PM (UTC)
batch_enrichment View in Cloud Mar 19, 2024 at 03:44 PM (UTC)
slopp commented 6 months ago

The name of this asset check is kind of a bummer:

Screen Shot 2024-03-08 at 2 50 50 PM

Do you know what it looks like if it fails?

cnolanminich commented 5 months ago

The name of this asset check is kind of a bummer:

I definitely agree -- this is a dbt thing, so in the spirit of matching what you would see this test called in dbt I wasn't sure if we should change it. I'm open to a re-naming it, especially for demo purposes.

Do you know what it looks like if it fails?

Yeah, it's relatively uninformative tbh -- right now it just tells you it fails. If we want to get a bit more fancy, we could likely read in the run_results.json along the lines that @izzye84 did in #63 but to read in the test results to show how many rows failed. It won't show any more than that, since that's all dbt gives us, but it's certainly better than what is currently here

cnolanminich commented 5 months ago

added the column schema info (pairs nicely with this PR as it adds a dbt package dagster to emit the schema.

Taken from here: https://docs.dagster.io/integrations/dbt/reference#emit-column-schema-as-materialization-metadata-

It just works! Will want to see it in Snowflake as well as duckdb

image
slopp commented 5 months ago

I’d be good with merging