dagster-io / hooli-data-eng-pipelines

Example Dagster Cloud code for the Hooli Data Engineering organization.
72 stars 15 forks source link

add column-level-lineage to hooli #68

Closed cnolanminich closed 5 months ago

github-actions[bot] commented 5 months ago

Your pull request is automatically being deployed to Dagster Cloud.

Location Status Link Updated
batch_enrichment View in Cloud Apr 05, 2024 at 05:38 PM (UTC)
data-eng-pipeline View in Cloud Apr 05, 2024 at 05:38 PM (UTC)
snowflake_insights View in Cloud Apr 05, 2024 at 05:38 PM (UTC)
basics View in Cloud Apr 05, 2024 at 05:38 PM (UTC)
demo_assets View in Cloud Apr 05, 2024 at 05:38 PM (UTC)
slopp commented 5 months ago

something is still off, I don't see the column lineage in the UI though I do see columns called out correctly in the asset detail page now, eg

https://hooli.dagster.cloud/3e1001d6fd55847cc8be2b49bfcd5f9992ffd676/assets/CLEANED/users_cleaned?view=overview&lineageScope=neighbors

https://hooli.dagster.cloud/3e1001d6fd55847cc8be2b49bfcd5f9992ffd676/assets/CLEANED/users_cleaned?view=lineage&lineageScope=neighbors

cnolanminich commented 5 months ago

@slopp I haven't tracked down why, but locally on duckdb getting the warning:

An error occurred while building column lineage metadata for the dbt resource models/CLEANED/users_cleaned.sql. Lineage metadata will not be included in the event. Exception: No expression was parsed from 'TIMESTAMP_NS'

in snowflake on the branch deployment:

An error occurred while building column lineage metadata for the dbt resource models/ANALYTICS/order_stats.sql. Lineage metadata will not be included in the event.

Exception: No expression was parsed from 'TIMESTAMP_NTZ'

Initially I thought it was a dbt_date (dbt package imported by dbt_expectations) issue because I saw references to timestamp_ntz in macros in the manifest file, but I removed the package, deleted all the manifests, and re-built everything and it's still giving the same messages

cnolanminich commented 5 months ago

things look good after switching to use get_env() to switch the dbt run target -- thanks @rexledesma for ID'ing the root cause 🎉

image