dbt-labs / dbt-postgres

Apache License 2.0
22 stars 12 forks source link

[Feature] Make `psycopg2-binary` a default dependency, add `psycopg2` as an optional dependency #6

Closed mikealfare closed 5 months ago

mikealfare commented 7 months ago

Is this your first time submitting a feature request?

Describe the feature

In the dbt-core repo, dbt-postgres has a conditional build dependency for psycopg2. dbt-postgres defaults to psycopg2-binary unless overridden by an environment variable. With the move to pyproject.toml, we'll need a different way to handle this.

Proposal

Default to psycopg2-binary in the primary build dependencies. Offer psycopg2 via an optional extra. pyproject.toml would look like this:

# pyproject.toml
dependencies = [
    "psycopg2-binary",
    ...
]
[project.optional-dependencies]
compile = ["psycopg2"]
...

This would offer the following installation options:

Who will this benefit?

This provides flexibility for folks installing dbt-postgres. Some users need the binary version of psycopg2 to avoid installing other build tooling. This is ideal for development. Others need the source version of psycopg2 so that it gets compiled to their machine. This is ideal for production.

### Tasks
- [ ] Make the change in pyproject.toml
- [ ] Update docs with how to install this as a user
colin-rogers-dbt commented 7 months ago

cc: @dbeatty10 @jtcohen6 as they probably have some context

This would be a departure for users' current install behavior as it will require people to have OS dependencies preinstalled (i.e. a C compiler, various libs etc) so we'll need to pair this with a docs update (something we'll need to do anyways as part of decoupling)

dbeatty10 commented 7 months ago

My typical instinct is to preserve whatever the current behavior is. In this case, that would be keeping psycopg2-binary within the dependencies, and adding psycopg2 to the optional dependencies.

What would be the advantages of the proposed order of psycopg2 as the default and psycopg2-binary as optional?

colin-rogers-dbt commented 7 months ago

having given this some thought I think I agree, ultimately for production use cases users should install psycopg2 as it's significantly more performant however for users installing locally (and for our testing environments) psycopg2-binary is the better choice as it requires the least amount of configuration and is faster to install.

mikealfare commented 7 months ago

I've updated the ticket to reflect the proposal. Default to psycopg2-binary to maintain consistency and avoid compiling psycopg2 every time; offer psycopg2 as an extra.

dbeatty10 commented 7 months ago

We'll want to add some kind of guidance in the v1.8 migration guide.

We don't have the DBT_PSYCOPG2_NAME environment variable documented in that many places:

So we could probably just say something like this:

Some folks may be using the DBT_PSYCOPG2_NAME environment variable to install psycopg2 instead of psycopg2-binary, possibly like this:

DBT_PSYCOPG2_NAME=psycopg2 pip install dbt

This has been replaced with the following instead:

pip install dbt-postgres[compile]
mikealfare commented 7 months ago

I added the user docs label. I feel like we should add that on the dbt-postgres section of docs.getdbt.com.

martynydbt commented 5 months ago

Closing for Colin

dbeatty10 commented 5 months ago

Assuming https://github.com/dbt-labs/dbt-postgres/pull/41 resolved this?

If so, will there actually be a behavior change between versions of dbt-postgres that we need to add to a migration guide (requiring user_docs)?

Or does https://github.com/dbt-labs/dbt-postgres/pull/41 just preserve the behavior of dbt-postgres when it was in the dbt-core repo?