dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.37k stars 1.56k forks source link

[Bug] dbt deps automatically recognizes projects in subdirectories #9719

Open djbelknapaw opened 4 months ago

djbelknapaw commented 4 months ago

Is this a new bug in dbt-core?

Current Behavior

I'm attempting to build an integration_tests sub-project similar to dbt_utils, then install the parent project when running the integration tests. The file in my_project/integration_tests/packages.yml file is the same as in dbt_utils:

packages:
    - local: ../

This installs the parent project as a package, but that includes the integration_tests child project which dbt is recognizing as a project, and attempts to install its dependencies again, resulting in an endless call of dependencies.

I end up with a directory structure of:

my_project
/integration_tests
/integration_tests/dbt_packages/my_project
/integration_tests/dbt_packages/my_project/integration_tests
/integration_tests/dbt_packages/my_project/integration_tests/dbt_packages/my_project
...

Eventually deps fails with an error: "[WinError 206] The filename or extension is too long: 'dbt_packages\\\\my_project\\\\integration_tests\\\\dbt_packages\\\\my_project\\\\integration_tests\\\\dbt_packages\\\\my_project\\\\integration_tests\\\\dbt_packages\\\\my_project\\\\integration_tests\\\\dbt_packages\\\\my_project\\\\integration_tests'"

Expected Behavior

When running dbt deps to a local project, only recognize the project.yml and packages.yml from the directly-referenced project and not sub-project directories. In this example, dbt should only look at ../packages.yml and not be looking at ../integration_tests/packages.yml.

Steps To Reproduce

  1. Create a project containing a sub-project
  2. In the sub-project, add a package pointing to the parent project
  3. Run dbt deps

Relevant log output

No response

Environment

- OS: Windows 10
- Python: 3.11.7
- dbt: 1.7.9

Which database adapter are you using with dbt?

No response

Additional Context

No response

dbeatty10 commented 4 months ago

Thanks for raising this issue @djbelknapaw !

Does this only happen on Windows when using a PowerShell or cmd.exe terminal? Or does it also happen when using WSL (Windows Subsystem for Linux)?

Suspected root cause

Here's what I think is happening:

👉 When dbt installs a local package, it uses a symlink if it can. Otherwise, it makes a copy of the entire directory.

My understanding is that some Windows environments don't allow creation of a symlink, so dbt installs the package via the copy approach instead. Since the install location is a subdirectory of the package being installed, it exhibits recursive behavior you observed.

Potential solution

One approach we can consider when a symlink is not possible:

  1. Instead of copying directly to the dest_path, use a temporary location as an intermediate.
  2. Then move it from the intermediate location to the final dest_path location.

Here's the relevant source code: https://github.com/dbt-labs/dbt-core/blob/6fd0a947297056e4a92d796111a2be578b774b47/core/dbt/deps/local.py#L65-L70

djbelknapaw commented 4 months ago

Correct - wsl successfully sets up the symlink and doesn't error out. Also when I run powershell as admin it's able to create a symlink, so it's specific to the copytree code.

It seems like you could just use the ignore parameter in the copytree so it stops recursing into installed packages if finds the project_root of the project calling deps in the source tree structure? This worked in a really quick local copy, but I'm not sure if it works more broadly.

shutil.copytree(src_path, dest_path, 
                         ignore = lambda directory, contents:
                              project.packages_install_path if directory == project.project_root
                              else [])
dbeatty10 commented 4 months ago

💡 Great idea about the ignore parameter @djbelknapaw !

No pressure, but are you interested in opening a PR that includes your solution, by any chance?

djbelknapaw commented 4 months ago

@dbeatty10 Open! First time contributing here, so let me know if there's anything to do differently.