dbt-labs / hubcap

This app adds modules to the hubsite at hub.getdbt.com
13 stars 100 forks source link

feat: source_db package #280

Closed BfdCampos closed 1 year ago

BfdCampos commented 1 year ago

Description

Tell us about your new package!

Link to your package's repository: https://github.com/BfdCampos/source_db/

Checklist

This checklist is a cut down version of the best practices that we have identified as the package hub has grown. Although meeting these checklist items is not a prerequisite to being added to the Hub, we have found that packages which don't conform provide a worse user experience.

First run experience

Customisability

Dependencies

Dependencies on dbt Core

Versioning

joellabes commented 1 year ago

Hey @BfdCampos, this one confuses me a bit! I don't really get why you would need to do this - it looks like your goals would be met by using deferral, in particular --favor-state and perhaps explicitly setting a --defer-state

Because of this, I'm wary of adding it to the package hub as it might get other users into a confusing situation where they inadvertently use this instead of taking advantage of the native functionality

Can you tell me more?

BfdCampos commented 1 year ago

Hi @joellabes, thanks so much for reviewing and reaching out!

So this package is designed to dynamically set the database source based on an environment variable. They are useful for conditional database routing but work differently from the defer feature from what I understand from the documentation.

The source_db package allows the user to easily switch between databases on the fly based on environmental variables which can be declared with your dbt command or at a global environment level for a multi-tenant db setup. We find a lot of use for this in my own company for when we need to test a run locally based off of data in other databases (dev or prod). This is particularly useful for us as we cannot run dbt in production at all (meaning no access to the production artefacts). Only via automated systems.

From what I understood, the defer feature is for particularly useful for optimising computational resources in CI. It switches between databases or schemas based on the existence of a model in the current environment, automatically referring to a production model if a development one does not exist, but requires that a manifest from a previous dbt invocation be passed to the --state flag or env var which for us, is not possible. So this feature does not work for our use case unfortunately.

In summary, the source_db package allows for conditional logic based on environment variables, enabling more complex routing that defer does not offer at this time.

I would also be happy if this feature was taken up by dbt as a default command if you think it adds value šŸ˜Š

(Anecdotal but the main reason why I even made this PR was because I've already sent the exact copy of the code for the package to 4 friends at different companies because they found it useful, so I thought instead of having to send it manually, why not make it into a package?)

joellabes commented 1 year ago

Works for me! It might be worth adding a bit more of that context to your readme to help folks understand the contexts where it's useful, but let's get this merged!

BfdCampos commented 1 year ago

Perfect will do now!! Thanks so much @joellabes šŸ™ŒšŸ™Œ

BfdCampos commented 1 year ago

@joellabes apologies for the delay. This week has taken the better of me. But I have added the notes to the README as you suggested and made another release so that the package has the latest information. Thanks šŸ˜Š