dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.91k stars 1.63k forks source link

[CT-775] [Enhancement] `dbt deps` does not correctly resolve local dependency of local dependency #5410

Open skirino opened 2 years ago

skirino commented 2 years ago

Is there an existing issue for this?

Current Behavior

Suppose there are the following local dependency relations: pkg1 => pkg2 => pkg3. Then, the directory of pkg3 is specified by packages.yml file of pkg2. But, when running dbt deps in pkg1, pkg3 is searched from pkg1's directory, not from pkg2's directory.

Expected Behavior

The directory of pkg3 should be resolved from pkg2's directory, so that pkg3 can be found regardless of the location of pkg1.

Steps To Reproduce

I've prepared a minimal repro: https://github.com/skirino/dbt_local_dep_path_issue

  1. git clone https://github.com/skirino/dbt_local_dep_path_issue.git
  2. cd dbt_local_dep_path_issue/b/c
  3. dbt deps

    [2022-06-27 21:01:17] $ dbt deps                       [~/tmp/dbt_local_dep_path_issue/b/c] 0s
    12:01:20  Running with dbt=1.1.1
    12:01:21  Encountered an error:
    Runtime Error
      no dbt_project.yml found at expected path /Users/skirino/tmp/dbt_local_dep_path_issue/b/common2/dbt_project.yml

Relevant log output

No response

Environment

- OS: macOS 12.4
- Python: 3.7.13
- dbt: 1.1.1

What database are you using dbt with?

snowflake

Additional Context

This is the same issue as described in https://github.com/dbt-labs/dbt-core/issues/4538#issuecomment-1127034448, but I think it deserves a separate issue.

ChenyuLInx commented 2 years ago

Putting the language tag since @emmyoop is more familiar with the dbt deps

emmyoop commented 2 years ago

I always love a good reproduction repo. ❤️ Thanks for that @skirino!

I agree this is related to #4538 but I think it may have to do with base_path as well so a separate issue is appropriate. This will take some more research and I'm not sure of the consequences of supporting this (circular dependencies, specifically).

I'm also going to change this to an enhancement, because similar to #5374 this was not the original intent on how this would be used. @jtcohen6 please correct me if there's something missing.

github-actions[bot] commented 1 year ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions[bot] commented 1 year ago

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

jaklan commented 7 months ago

Bumping the issue, it makes local packages in the monorepo setup completely unusable. And the above @emmyoop's statement:

this was not the original intent on how this would be used

is not valid anymore, as the monorepo use-case is mentioned directly in local package docs nowadays:
https://docs.getdbt.com/docs/build/packages#local-packages

cc: @jtcohen6, because that's absolutely critical issue for our dbt monoproject refactor

jaklan commented 7 months ago

@dbeatty10 could you re-open the issue not to create duplicates?

jtcohen6 commented 7 months ago

@jaklan Thanks for flagging this - it's worth another look given the changes to packages/progress over the past few years.

Right now, the local package's resolved path is always relative to the root project from which you're running dbt deps:

https://github.com/dbt-labs/dbt-core/blob/71f3519611a6a21c2cb492df2d54e60af040e0f3/core/dbt/deps/local.py#L44-L48

Using OP's reproduction case, project.project_root is always Path('/Users/.../dbt_local_dep_path_issue/b/c'). We'd need to find a way to instead pass in the absolute path of the package's packages.yml, because what we really want there is Path('/Users/.../dbt_local_dep_path_issue/common1') + Path('../common2').

So we'd need a way to get at the location of the packages.yml file in which each local package specification is actually defined. That packages.yml could very likely be located within the downloads directory, if it belongs to a package which was itself installed via git / Hub / tarball method. I think the trickiness here will be a mix of plumbing (the deps classes have several layers of nesting) and of thinking through potential edge cases at the intersection of multiple deps methods.

jaklan commented 7 months ago

@jtcohen6 thanks for picking up the issue. I fully agree it gets quite tricky when we consider mix of various installation methods, however - maybe it's worth to split it into two phases: a) temporary solution to fix the local-of-local scenario b) the final, robust one handling various combinations.

I can imagine a use-case, where we install a package via git and that package has some local dependency (e.g. a common package in its repo) - but then the complexity is getting really serious, we have to re-think such edge-cases like e.g. sparse checkout when using subdirectory for packages relying on local dependencies in another directories:

If the packaged project is instead nested in a subdirectory—perhaps within a much larger mono repo—you can optionally specify the folder path as subdirectory. dbt will attempt a sparse checkout of just the files located within that subdirectory.

and most likely much more other not-so-obvious scenarios.

But having said that, when we talk about the monorepo setup - I believe the most common scenario is the local-of-local one, and solving only that scenario would be enough to unblock most monorepo users and give you time to re-design the current approach in a more comprehensive way.

misteliy commented 3 months ago

Hi team, is there a possibility we could tackle the local-of-local use-case for now? This issue is important for all customers that want to rely on dbt packages more across dbt core and cloud...

dbeatty10 commented 3 months ago

@misteliy I'm going to take a closer look at the local-of-local use-case during the next sprint.

Does the example in https://github.com/skirino/dbt_local_dep_path_issue encapsulate the local-of-local use-case that you're thinking of?

jaklan commented 3 months ago

@dbeatty10 yep, based on the README - that's exactly the issue

misteliy commented 2 months ago

@dbeatty10 keep us posted - thank you!

dbeatty10 commented 2 months ago

@jaklan and @misteliy I took a look at this and put up a draft PR: https://github.com/dbt-labs/dbt-core/pull/10600

It worked when I tested it against https://github.com/skirino/dbt_local_dep_path_issue, but it's failing a couple CI tests (specifically these two). I haven't yet had a chance to dig into the reason why.

jaklan commented 2 months ago

@dbeatty10 thank you for taking care of it, highly appreciated!