airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.79k stars 4.05k forks source link

Airbyte on K8s/helm: Add registry prefix for all image tag references #21123

Open seanglynn-thrive opened 1 year ago

seanglynn-thrive commented 1 year ago

Tell us about the problem you're trying to solve

We are currently running Airbyte via helm within our internal AKS cluster. Our company enforces an AKS policy which blocks us from pulling any public image references which do not have the explicit registry prefix. For example: if we want to deploy the popular busybox container within our K8s environment:

We succeeded in deploying all of the Airbyte infra components (server, webapp, temporal etc.) by overriding some helm values and K8s definitions. However, we are currently stuck as we are facing this issue while attempting to configure our connectors through the UI.

When we configure a connector (postgres for example), our airbyte service account cannot download the connector container image and we cannot deploy our connector:

image

In the UI we get an exception message: "Sorry. Something went wrong... "

When we check the logs, we can see that the worker pod was unable to spin up the containers that Airbyte needs to configure a source:

Caused by: io.temporal.failure.ApplicationFailure: message='Failure executing: POST at: https://10.0.0.1/api/v1/namespaces/airbyte-namespace/pods.
Message: Forbidden! Configured service account doesn't have access. 
Service account may have been revoked. admission webhook "validation.gatekeeper.sh" denied the request: [azurepolicy-...] 
Container image airbyte/source-postgres:1.0.35 for container main has not been allowed

As the exception states we cannot pull the image reference: airbyte/source-postgres:1.0.35. If there was some way to prepend the docker.io/ registry prefix or pass a default registry address, we may be able to continue with our Airbyte deployment.

Other similar errors that occur:


[azurepolicy-k8sazure...] Container image alpine/socat:1.7.4.3-r0 for container relay-stderr has not been allowed.
[azurepolicy-k8sazure...] Container image alpine/socat:1.7.4.3-r0 for container relay-stdout has not been allowed.
[azurepolicy-k8sazure...]  Container image busybox:1.28 for container init has not been allowed.
[azurepolicy-k8sazure...] Container image curlimages/curl:7.83.1 for container call-heartbeat-server has not been allowed..', type='io.fabric8.kubernetes.client.KubernetesClientException', nonRetryable=false

Describe the solution you’d like

Ideally, we would like to set a global image prefix which would cascade through all of the Airbyte components and would be used in all image pulls throughout the helm chart and Airbyte image pull operations.

Describe the alternative you’ve considered or used

Overriding the helm chart / k8s values manually allowed us to add the image prefix and deploy all of the infra components but not the dynamic containers that spin up during configuration operations within Airbyte

Are you willing to submit a PR?

Yes I would be willing to submit a PR, once I can get some assistance from a core Airbyte developer or a community dev 🚀

k0t3n commented 1 year ago

Hello @seanglynn-thrive! Did you solve it?

mfsiega-airbyte commented 1 year ago

We're looking into how this could be supported. We'll update within a few weeks with a recommendation.

Anna-Katona commented 1 year ago

Hey! We use our own image registry, and we'd like to have an opportunity to override it too. Is it possible to add env var for this case also?

seuf commented 3 months ago

On July 15, docker hub will enforce rate limiting from google IP. https://cloud.google.com/blog/products/containers-kubernetes/mitigating-the-impact-of-new-docker-hub-pull-request-limits?hl=en

To avoid that we have deployed a docker hub mirror in google artifact registry and we are using it for all our images.

But for airbyte dynamic containers there is no environment variable configuration to override container registry.

ana-cafemedia commented 2 months ago

Encountering the same issue in aws.... we'd like to configure ecr pass-through cache for docker image to avoid the rate limitation but there is no good way to accomplish that for Airbyte source/destination containers. Please, provide a way to override the default docker.io url

cgreene-pax commented 3 weeks ago

We're having a similar issue with a twist, that the only images we can use are the ones mirrored to our internal ECR repo