Open Starkie opened 1 month ago
Thank you for all the details! This is super helpful. We can try to reproduce but I think it's not a bad idea to raise this directly on SQL glot as well to ensure the parser can in general handle these types of statements.
Hey all!
Describe the bug We have a dbt core project that targets a BigQuery data warehouse. When ingesting the metadata into DataHub with
include_column_lineage=true
andprefer_sql_parser_lineage=true
, the column-level lineage (CLL) is missing for some of the datasets. The table-level lineage is fine for all of them.We've tracked down the issue to a row deduplication macro from dbt_utils. The problem seems to be in the generated SQL code:
When replacing it with a simple
SELECT * FROM all_articles
statement, the CLL is generated correctly.We're not sure if this is specific to DataHub or should be reported to sqlglot instead. Let me know and I can create the issue there.
To Reproduce We have created a repository with a small dbt project to reproduce it: https://github.com/Starkie/datahub-dbt-lineage-repro
Steps to reproduce the behavior:
<>
with the correct ones for your model:source: type: "dbt" config: env: dev