airbytehq / PyAirbyte

PyAirbyte brings the power of Airbyte to every Python developer.
https://docs.airbyte.com/pyairbyte
Other
235 stars 42 forks source link

Add guidance for avoiding version conflicts with connectors and other Python CLI apps #543

Open aaronsteers opened 1 week ago

aaronsteers commented 1 week ago

[!NOTE] The text below is a composite of resources from various locations. We'll continue to evolve this and consider putting into its own "docs" page if helpful to folks. You are invited to drop a comment below or "+1" if this does work - and also if it doesn't work for your use case. Thanks!

We have a common request for assist in resolving Python dependency conflicts with other libraries and Python CLI apps. I'm creating this issue to document some of the considerations, work arounds, and best practices.

Preface: The distinction between "apps" and "libraries"

For this discussion, let's say an "app" is anything with a CLI, where "libraries" must be invoked directly from within Python.

This is important, because while all libraries you are using must coexist in the same Python environment, the same is not true for CLI apps. The best practice for CLI apps is to install them in their own virtual environment. While this can often be cumbersome and manual, there are some helpful tools to streamline it.

Best Practice for installing Python CLI Apps

Whenever installing CLI Apps like dbt, the best practice is to create a virtual environment and install the CLI app into its own virtual environment. This provides the most stable experience for the CLI app itself, and also completely decouples those version constraints of the CLI app and the version constraints of libraries you are using on the same workspace or container.

Streamlining CLI App Installation

There are two very good tools to make CLI app installation just as easy (or almost as easy) as normal pip install methods. The below options apply to all Python CLI apps - which includes tools like dbt and harlequin, as well as (optionally) preinstalling Airbyte connectors like airbyte-source-hubspot.

Using pipx

pipx is the original (to my knowledge) and most widely used. In most cases, you can simply run pip install pipx and then pipx install my-tool. The pipx syntax intentionally is as similar as possible to the syntax of pip so that many tools can be installed into their own dedicated virtual environment simply by replacing the word pip with pipx. (pipx now also comes standard on many Python images so you might not need to pre-install it.)

Using uv and uvx

A newer tool called uv has a similar uvx or uv tools command which can be used similarly to pipx. It is newer and faster than pipx, but also less tested because it is (for now) less used.

Common Installation Patterns

Docker-Based Pre-Installs

Some sample Dockerimage code in this comment specifically around pre-installing connectors onto docker images:

Reported to me by a user:

The trick that worked in Airflow was to use a Dockerfile that handles the isolation of installing the connectors into their own virtualenvs:

# Pre-install the connnector(s) in their own virtualenv
RUN python -m venv source_github && source source_github/bin/activate &&\
    pip install --no-cache-dir airbyte-source-github && deactivate

# ... repeat for other connectors ...

# Test that the executable works and we can find it
RUN source/bin/source-github spec

# Go ahead and install PyAirbyte as usual
RUN python -m venv pyairbyte_venv && source pyairbyte_venv/bin/activate &&\
    pip install --no-cache-dir airbyte==0.10.4 && deactivate

If pipx is preinstalled on the image, this is slightly easier:

# pipx handles the virtual-env and auto-adds the connector CLI to PATH:
RUN pipx install airbyte-source-github
RUN pipx install airbyte-source-faker

# Test that the executables work and we can find them on PATH
RUN source-github spec
RUN source-faker spec

# Go ahead and install PyAirbyte as usual
RUN python -m venv pyairbyte_venv && source pyairbyte_venv/bin/activate &&\
    pip install --no-cache-dir airbyte==0.10.4 && deactivate

Installing dbt

Per this discussion: https://github.com/airbytehq/PyAirbyte/issues/441

Slightly more difficult than a normal pipx install, because it requires more than one package installed into the same virtual environment:

# Install dbt core and postgres dbt engine:
pipx install --preinstall=dbt-postgres dbt-core
# Confirm install worked:
dbt --version

Related Issues: