dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.99k stars 1.63k forks source link

[CT-27] [Feature] invalidate the packages folder when packages.yml changes #4557

Open joellabes opened 2 years ago

joellabes commented 2 years ago

Is there an existing feature request for this?

Describe the Feature

A frequent issue for both Cloud and Core users is that when you update your packages.yml file, if you don't remember to run dbt deps you will stay stuck on the old version.

In the last couple of days I've seen a couple of cases of this (1, 2) in the rollout of utils 0.8.0.

What I'd like to see is something similar to partial parsing being invalidated when dbt_project.yml changes. When you change your packages.yml, if you don't run deps before your next dbt invocation, it should either fail with a clear warning (sort of like what it does when the number of packages listed in the file doesn’t match the number of directories installed), or even classier, run deps for you first.

Describe alternatives you've considered

Nothing?

Who will this benefit?

Pretty much everyone who uses packages has been bitten by this at some time or other.

Are you interested in contributing this feature?

maybe! Depends on how big a lift it'd be

Anything else?

No response

gshank commented 2 years ago

This is a very reasonable request. But running dbt deps on changes is a bigger lift because of the way that our tasks work. We are looking at the possibilities of splitting out the deps code, but that's probably not going to happen in the near future. We could do a similar thing to what we do in partial parsing, the complication there is that it would be easy to put it in the partial parse msgpack file, but that would mean you'd only get the checking when partial parsing is being used, so we'll have to put it in some other sort of state file.

github-actions[bot] commented 2 years ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

joellabes commented 2 years ago

Pragmatic me wants to say it's OK to only do this if partial parsing is enabled, but that sounds like it's signing us up to get a confusing bug report in a few months when someone without partial parsing gets caught out.

Is there any technical aversion or added complexity to adding another state file for packages?

joellabes commented 1 year ago

This just bit me again - I updated my branch on our internal analytics project to get the new reset_dev_env macro, and in doing so collected a shiny new packages.yml file which added a new Fivetran package. I couldn’t work out why I was getting warnings about not being able to find the referenced model ad_reporting__url_report on a brand new clone until I realised that it was provided by a package.

jtcohen6 commented 1 year ago

@dbeatty and I chatted about this again last week - written-up thoughts forthcoming

joellabes commented 1 year ago

I just noticed this log line:

05:22:13  Running with dbt=1.3.0
05:22:13  Unable to do partial parsing because a project dependency has been added

does this mean dbt already knows what dependencies exist in a partial parsing context?

jtcohen6 commented 1 year ago

@joellabes Yes - but only once parsing has begun, and the packages have already been installed. It isn't something that dbt knows as part of the deps codepath.

Doug's write-up: https://github.com/dbt-labs/dbt-core/issues/6643

The big idea in "Phase 2" comes down to: If we had a clean record of the packages installed the last time we ran dbt deps, we could "resolve" the current packages.yml, and calculate a diff from the last install (dbt_packages/packages.json). We could do that using just Hub-requested metadata, without actually downloading any packages. (We've talked elsewhere about creating this clear separation, in deps, between "resolution" and "installation" steps.) Then, we could do a "slim" deps (just update changed packages). Maybe it gets to be so fast that, if we also cached Hub metadata, this could just be part of parsing...

github-actions[bot] commented 1 year ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions[bot] commented 9 months ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions[bot] commented 8 months ago

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

iamtodor commented 8 months ago

@joellabes can we please reopen this issue?

github-actions[bot] commented 2 months ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

iamtodor commented 2 months ago

Comment to prevent closing