airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.55k stars 4.01k forks source link

Support for docker run's "--add-host" argument #9693

Closed rnagy90 closed 3 weeks ago

rnagy90 commented 2 years ago

It would be nice, to add additional host for the containers started by the airbyte-worker. Because you are not able to "hack" this argument in the DOCKER_NETWORK environment variable after the network name, it's a little tricky to reach applications running on the host machine. The most common suggestion is to add the well known "magic" domain to the container's /etc/hosts file like this:

--add-host host.docker.internal:host-gateway

or in compose

extra_hosts:
  - "host.docker.internal:host-gateway"

After this you can reference the host machine's network via the host.docker.internal domain.

Example: The airbyte stack is running in a docker environment, but the PostgreSQL database is running in the host machine (security, maintenance reasons and for non docker based applications). The configured airbyte destination is a PostgreSQL one and we want to reach it from the local network.

My suggestion is to add it as an optional environment variable and I think the most convenient name for this is _EXTERNALHOSTS.

Workaround If someone want a workaround for this, you can use a custom, external network and it's gateway combination to reach the host machine's application.

1.: create a docker network manually. 2.: set the DOCKER_NETWORK environment variable in the worker service as the previously created network's name 3.: enable the network's Gateway value in the pg_hba.conf 4.: configure the PostgreSQL source/destination host also with the Gateway value 5.: (optinally) if you have to, add every container in the stack to the network

tmotyl commented 6 months ago

How do you ".: set the DOCKER_NETWORK environment variable in the worker service as the previously created network's name" ? If I'm not mistaken, airbyte is creating new container to run postgres connector, so it's not using the worker container directly? In logs I see:

airbyte-worker                    | 2024-03-20 09:03:33 platform > Creating docker container = destination-postgres-check-f90363b0-79aa-4eb5-9950-d4a7f07c02c1-0-smzpo with resources io.airbyte.config.ResourceRequirements@676e95e[cpuRequest=,cpuLimit=,memoryRequest=,memoryLimit=,additionalProperties={}] and allowedHosts null
airbyte-worker                    | 2024-03-20 09:03:33 platform > Preparing command: docker run --rm --init -i -w /data/f90363b0-79aa-4eb5-9950-d4a7f07c02c1/0 --log-driver none --name destination-postgres-check-f90363b0-79aa-4eb5-9950-d4a7f07c02c1-0-smzpo --network host -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -e DEPLOYMENT_MODE=OSS -e WORKER_CONNECTOR_IMAGE=airbyte/destination-postgres:2.0.4 -e AUTO_DETECT_SCHEMA=true -e LAUNCHDARKLY_KEY= -e SOCAT_KUBE_CPU_REQUEST=0.1 -e SOCAT_KUBE_CPU_LIMIT=2.0 -e FIELD_SELECTION_WORKSPACES= -e USE_STREAM_CAPABLE_STATE=true -e AIRBYTE_ROLE=dev -e WORKER_ENVIRONMENT=DOCKER -e APPLY_FIELD_SELECTION=false -e WORKER_JOB_ATTEMPT=0 -e OTEL_COLLECTOR_ENDPOINT=http://host.docker.internal:4317 -e FEATURE_FLAG_CLIENT=config -e AIRBYTE_VERSION=0.53.1 -e WORKER_JOB_ID=f90363b0-79aa-4eb5-9950-d4a7f07c02c1 airbyte/destination-postgres:2.0.4 check --config source_config.json
tmotyl commented 6 months ago

after investigation I found out that the destination postgress container (which is created on the fly) is being connected to "host" network. So using "localhost" as host worked out. image

davinchia commented 3 weeks ago

No longer relevant since we deprecated docker.