dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.12k stars 1.39k forks source link

[dagster-dbt] `fetch_column_metadata` fails when building models installed by dbt dependencies #23116

Open cbini opened 1 month ago

cbini commented 1 month ago

Dagster version

1.7.14

What's the issue?

When building column metadata for models that are installed from dbt dependencies, the function fails and Dagster logs a warning, presumably because the source SQL file doesn't exist on the project. I could be wrong--I would think the SQL should be present in the target folder, but Dagster isn't finding it.

An error occurred while building column lineage metadata for the dbt resource `models/staging/stg_renlearn__fast_star.sql`. Lineage metadata will not be included in the event.
Exception: [Errno 2] No such file or directory: '/app/src/dbt/kippmiami/target/kippmiami_dbt_assets-ca7d822-bf32741/compiled/kippmiami/models/staging/stg_renlearn__fast_star.sql'
Traceback (most recent call last):
  File "/app/.venv/lib/python3.12/site-packages/dagster_dbt/core/resources_v2.py", line 1018, in _fetch_column_metadata
    lineage_metadata = _build_column_lineage_metadata(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/dagster_dbt/core/resources_v2.py", line 859, in _build_column_lineage_metadata
    parse_one(sql=node_sql_path.read_text(), dialect=sql_dialect),
                  ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/pathlib.py", line 1027, in read_text
    with self.open(mode='r', encoding=encoding, errors=errors) as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/pathlib.py", line 1013, in open
    return io.open(self, mode, buffering, encoding, errors, newline)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/app/src/dbt/kippmiami/target/kippmiami_dbt_assets-ca7d822-bf32741/compiled/kippmiami/models/staging/stg_renlearn__fast_star.sql'

What did you expect to happen?

To be clear, the run still succeeds and the asset materializes, but it would be great to find a way around this or explicitly log that these types of models aren't supported.

How to reproduce?

run using a dbt model that's installed from a dbt dependency:

dbt_build = dbt_cli.cli(args=["build"], context=context)

yield from dbt_build.stream().fetch_column_metadata()

Deployment type

Dagster Cloud

Deployment details

Hybrid

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

nixent commented 1 month ago

Some of catalogs use two component names, Spark for instance. In that case dbt_resource_props["database"] will be present in the manifest.json but dbt_resource_props["database"] will be None

Suggest adding extra condition to validation

  if (
      "database" in dbt_resource_props
      and "schema" in dbt_resource_props
      and "alias" in dbt_resource_props
  ):
      relation_name = ".".join(
          [
              dbt_resource_props["database"],
              dbt_resource_props["schema"],
              dbt_resource_props["alias"],
          ]
      )