dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.37k stars 1.56k forks source link

[CT-643] Support package resources with `path:` selector method #5243

Open jtcohen6 opened 2 years ago

jtcohen6 commented 2 years ago

Prompts:

Short description

The dbt Cloud IDE uses dbt list behind the scenes to power the DAG viz. When it provides +dbt_packages/package_name/models/.../model_name.sql+, dbt-core doesn't return any resources.

The reason: dbt-core's path: selector method doesn't support resources defined in packages.

https://github.com/dbt-labs/dbt-core/blob/72c17c446440afe7d4b6501ed364effe8e14c9f2/core/dbt/graph/selector_methods.py#L288-L289

Proposed resolution

Update FileSelectorMethod to accept the full relative path of resources defined in packages, including the install path and the package name, as a way to select that resource.

For example, dbt_packages/dbt_project_evaluator/models/marts/dag/fct_direct_join_to_source.sql to select fct_direct_join_to_source.

Acceptance criteria

Suggested tests

https://github.com/dbt-labs/dbt-core/blob/e81f7fdbd5b248d7cc24847da1ac6eee453f9d1a/tests/unit/test_graph_selector_methods.py#L1150

https://github.com/dbt-labs/dbt-core/blob/main/tests/functional/list/test_list.py

Alternatives

Both of these alternatives (the original resolutions I suggested in ~2022) would require additional logic on the "client" side. I believe a preferable resolution allows clients to pass the full (relative) path to all resources, both those defined in the root project and those defined in packages, without any additional logic.

  1. Start returning package resources based on their original_file_path (always relative), and expect services passing file names into dbt list to trim off dbt_packages/package_name/. Because models with the same name can be defined in multiple installed packages, this could return multiple models that have the same relative path. For example, if I also models/my_model.sql, even if they're defined in different packages. I think that's fair, and in keeping with expected behavior.

  2. Add a new FileIDSelector. dbt-core does have a notion of an internal "file ID" that looks like package_name://relative/file/path, which it uses to uniquely identify each file during partial parsing. That still requires knowing the name of the package, though. I'm pretty sure dbt deps takes care of this. (It should be the same for Hub packages, based on how dbt deps renames the package after installation, but not necessarily the same for non-Hub packages.)

github-actions[bot] commented 1 year ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions[bot] commented 1 year ago

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

racheldaniel commented 3 months ago

The XP team is watching this as a possible solution for issues with compile/preview not working on dbt packages in the IDE. Per slack thread with core this is to be reopened.

ChenyuLInx commented 2 months ago

We might need to insert dbt packages in the original_file_path

ChenyuLInx commented 2 months ago

Whoever picks this one up please pair with @gshank