airbytehq / PyAirbyte

PyAirbyte brings the power of Airbyte to every Python developer.
https://docs.airbyte.com/pyairbyte
Other
231 stars 41 forks source link

💡 Feature Request: Ability to use database-type sources, such as `source-postgres` and `source-mysql` #87

Open aaronsteers opened 8 months ago

aaronsteers commented 8 months ago

Because most database sources are built on Java, they currently are not able to run in Python environments.

This feature would allow database-type sources to be run from PyAirbyte. Possible implementation options could be:

  1. Allow docker containers to be invoked by Airbyte.
    • Important: While this technically is not a big lift, not sure if we want to take this approach - as it would create a very sharp difference in user experience for those runtimes which have docker access, versus those which do not.
  2. Find a way to package and install Java connectors as standalone executables.
    • This is in theory also technically feasible, but this approach is subject to its own sharp edges - such as needing to have pre-built Java executables for n number of platforms/runtimes.

Available Workarounds

Workaround # 1: Pre-Installing the Java-based connector

One workaround is to pre-install the Java-based connector on your local machine or docker image, and then create a CLI which can mimic the CLI of a Python-based connector. If registered on PATH, PyAirbyte will find this connector and not know/care what language it is written in.

Workaround # 2: Treating the source DB as an externally-managed "cache"

An alternative workaround, which admittedly would not solve all use cases, would be what is described in this issue:

natikgadzhi commented 7 months ago

Hypothetically, we could make a wrapper source, source-docker-wrapper that takes a config __injected_source_image for example and tries to spin up docker to run the source, and proxy it's output to PyAirbyte. Or build this natively into PyAirbyte itself instead of the proxy source.

Pros:

Cons:

Running Java executable would require the host system to have the right version of Java, so I wonder if it's better than requiring docker at all. A bit more difficult to manage, I'd say.

aaronsteers commented 5 months ago

Circling back to this issue after a new option has opened up.

Users can now use Docker to run database sources, if they have it available. This feature is in 'experimental' status while we gather feedback, but it should work to unblock use cases that require SQL-type sources or any source written in java.

More info and documentation are here.

aaronsteers commented 3 months ago

Running docker sources is now promoted out of "experimental" status and is stable. Note: This only works if you have docker installed, which we recognize still will not be possible in some environments where you would want to run PyAirbyte.

https://airbytehq.github.io/PyAirbyte/airbyte/sources.html#get_source

image

image