airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.07k stars 4.11k forks source link

Allow depending on local python packages without having to inherit docker images #1134

Closed sherifnada closed 1 year ago

sherifnada commented 3 years ago

Tell us about the problem you're trying to solve

To use base_python or any other monorepo python package in a docker image, we currently must inherit from the base_python docker image. This is inconvenient if the docker image in question must inherit from any other image.

For example, standard test images inherit from standard test image. But to use base_singer in the test code, we must either manually copy the python dependency from the base_singer docker image, or directly inherit from the base_singer image, which is not always possible.

Acceptance Criteria

We have a way of depending on repo-local python packages

┆Issue is synchronized with this Asana task by Unito

jrhizor commented 3 years ago

I looked into this a bit last week. This also overlaps a bit with out efforts on removing requirements.txt

There is a mechanism to include relative paths as dependencies in setup.py: https://github.com/airbytehq/airbyte/compare/jrhizor/poc-no-reqs However, this requires absolute paths (which isn't possible to coordinate well across local and Docker environments, even if you're setting the path with environment variables).

Meltano recently switched to Poetry, which is "modern" package manager for Python. Their justification is interesting: https://gitlab.com/meltano/meltano/-/merge_requests/1964. Along with other articles, the case for Poetry was pretty convincing.

I tried setting up some with it: https://github.com/airbytehq/airbyte/compare/jrhizor/remove-requirementstxt?expand=1

The setup for Python projects actually looks quite a bit neater and automatically comes with dev vs main dependencies. You can also specify relative paths for dependencies that get resolved correctly.

That's a bit orthogonal to the main purpose of this PR which is packaging.

We could use pex to package all requirements and have a single file to copy over to Docker. Pex works best from a requirements.txt file though, so we'd probably want to generate a file from Poetry in the future that can be copied into the Docker image.

This would also be much faster since we wouldn't be pip installing multiple times.