Closed stevenayers closed 1 week ago
Thanks for opening this @stevenayers !
Can you share more about the specific use cases where combining a CLI flag with an environment variable is necessary or beneficial versus just merely including the packages-install-path
configuration in dbt_project.yml
?
Hi @dbeatty10, sure no problem! Let me break this down a bit.
packages-install-path
1. In scenarios when docker containers are being used this can raise difficulties. I won't go into too much detail because it's been documented quite well in this issue https://github.com/dbt-labs/dbt-core/issues/1710.
2. When you are dealing with a lot of orchestration/workflow systems you will often find that the working directory of each step does not share the same working directory as the previous, and they can often be dynamic. Take this pipeline as an example:
graph LR;
A[dbt debug]-->B[dbt run];
B-->C[dbt test];
C-->D[dbt docs generate];
Each working directory could look something like /tmp/job-id/step-id
dbt debug
: /tmp/1ad0ceb/ee74a60082b34c3a3d0df8a0d5d5cbfd7ec5ed6a
dbt run
: /tmp/1ad0ceb/607646b627e80fe5e45545589fc8c09482010978
dbt run
: /tmp/1ad0ceb/7e164e3ab723c357cb638ad6c1e1beef19a7fec6
dbt test
: /tmp/1ad0ceb/cb56f4fdc16d5a79953af3003645a1af5a000926
With this, you don't want to be re-installing your deps at every stage, and likely want to reuse them. This is where, like in issue #1710, you will want to use an environment variable like:
config-version: 2
packages-install-path: "{{ env_var('DBT_PACKAGES_INSTALL_PATH', 'dbt_packages') }}"
You could set packages-install-path: "../dbt_packages"
, but that's making assumptions when you sometimes need to use shell script logic to figure out what that directory path needs to be.
3. Say you have set packages-install-path
to /tmp/my_custom_packages_path
so it can be shared between steps. What if you're also running your CI/CD test pipeline in that environment?
Your packages.yml is changed in your feature branch, which updates the package contents in /tmp/my_custom_packages_path
. Your live data pipeline is in the middle of running, and when it goes to run, it fails because your feature branch has removed packages your live data pipeline was using when it was running.
This is where you'll want to do something like:
config-version: 2
packages-install-path: "{{ env_var('DBT_PACKAGES_INSTALL_PATH', 'dbt_packages') }}"
and in your pipeline you'll want to set DBT_PACKAGES_INSTALL_PATH
to something like /tmp/${ENVIRONMENT}/dbt_packages
.
packages-install-path
As I mentioned in the original issue, sometimes setting an environment variable can be a pain in some workflow systems. This also isn't very consistent or clean:
DBT_PACKAGES_INSTALL_PATH=/tmp/${ENVIRONMENT}/dbt_packages dbt run --target-path /tmp/${ENVIRONMENT}/target
You're setting config paths via two different methods.
Yesterday @jtcohen6 and myself had a chance to discuss the proposed new CLI flag + environment variable.
We've approached where flags can be set differently depending on use-case:
dbt_project.yml
file are reserved for things that don't change (very often) and are shared across users and invocations, whereasSo generally, we don't let these be set in both places, and it would take a really compelling case for us to do so.
In this case, it sounds like the main barrier is that setting environment variables is difficult within Databricks DBT Workflows. If this is the primary barrier, then we'd prefer not to add a new feature to dbt in order to work around it.
So we're closing this and the associated PR in https://github.com/dbt-labs/dbt-core/pull/9933 as not planned.
But if anyone can provide additional examples why should consider supporting a new --packages-install-path
CLI flag (and associated DBT_PACKAGES_INSTALL_PATH
environment variable) outside of Databricks DBT Workflows, we'd be willing to take another look.
Is this your first time submitting a feature request?
Describe the feature
Add a CLI parameter for the
packages-install-path
, similar to howtarget-path
has one.In the docs, under target-path, it says:
Describe alternatives you've considered
Using the env var
DBT_PACKAGES_INSTALL_PATH
.The issue here is that some orchestration tools, such as Databricks DBT Workflows make setting environment variables very difficult. By adding this cli parameter, we maintain consistency across global configs.
Who will this benefit?
People using orchestration tools with awkward limitations.
Are you interested in contributing this feature?
Yes, the PR is https://github.com/dbt-labs/dbt-core/pull/9933