Open sc-yan opened 8 months ago
We are seeing this error as well. Downgrading to 0.49.6 fixed the issue for us.
We're also experiencing this within GKE on the latest version 0.49.10
Experiencing this same issue on latest V0.49.18. Downgraded to helm chart V0.49.5, works fine now.
We're also running into this if we upgrade the chart past 0.49.6
Same issue with all charts after 0.49.6
After providing some environment variables in values.yaml
I made it work. Here's how part of my configuration looks like
## worker.extraEnv [array] Additional env vars for worker pod(s).
## Example:
##
## extraEnv:
## - name: JOB_KUBE_TOLERATIONS
## value: "key=airbyte-server,operator=Equals,value=true,effect=NoSchedule"
extraEnv:
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
key: AWS_ACCESS_KEY_ID
name: airbyte-airbyte-secrets
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
key: AWS_SECRET_ACCESS_KEY
name: airbyte-airbyte-secrets
- name: STATE_STORAGE_S3_ACCESS_KEY
valueFrom:
secretKeyRef:
key: AWS_ACCESS_KEY_ID
name: airbyte-airbyte-secrets
- name: STATE_STORAGE_S3_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
key: AWS_SECRET_ACCESS_KEY
name: airbyte-airbyte-secrets
- name: STATE_STORAGE_S3_BUCKET_NAME
value: ${STATE_STORAGE_S3_BUCKET_NAME}
- name: STATE_STORAGE_S3_REGION
value: ${STATE_STORAGE_S3_REGION}
Check here what environment variables you might be missing https://github.com/airbytehq/airbyte-platform/blob/9ffa4e9f44f06e65fe3b138204367d5da8c98f2c/airbyte-config/config-models/src/main/java/io/airbyte/config/EnvConfigs.java#L133-L142
@szemek thank you so much for the info! I followed your approach and and it worked! running 0.49.23 now. anyone who still has issues, please try the approach above. I'm gonna keep the issue open in case someone is looking for an answer. but please feel free to close it if you think no further action is needed.
for maintaining state with S3. I was able to resolve it by simply adding these two environment variable in woker section of values file : extraEnv:
I could find this here
Any idea on how to fix it on a EC2 deployment?
@HatemLar helm charts is supposed to be used in k8s. I assume you are deploying airbyte with docker/etc on EC2? trying to setup same env variables above following this guide. https://docs.airbyte.com/deploying-airbyte/on-aws-ec2
@sc-yan thank you for your help! Yes, deployed with docker on EC2, and we did follow that guide. You think we should declare these variables and in the instance or the docker-compose file?
@HatemLar it really depends on how you want to manage your infra/deployment. generally, docker is acting like VM so the app is not supposed to read values from host machine(which is EC2 in your case), unless you want to mount some volume into docker. it's common to setup these env variables in docker-compose file, but if you do have special cases, feel free to adjust so.
Using the below works (for GCS) since the values should likely already be in your configMap if you specified them in global.gcs:
extraEnv:
- name: STATE_STORAGE_GCS_BUCKET_NAME
valueFrom:
configMapKeyRef:
key: GCS_LOG_BUCKET
name: airbyte-airbyte-env
- name: STATE_STORAGE_GCS_APPLICATION_CREDENTIALS
valueFrom:
configMapKeyRef:
key: GOOGLE_APPLICATION_CREDENTIALS
name: airbyte-airbyte-env
That fixes the worker pod issue, however then I ran into the following with replication orchestrator
. This is not seen until you try to sync a connection.
Using the below works (for GCS) since the values should likely already be in your configMap if you specified them in global.gcs:
extraEnv: - name: STATE_STORAGE_GCS_BUCKET_NAME valueFrom: configMapKeyRef: key: GCS_LOG_BUCKET name: airbyte-airbyte-env - name: STATE_STORAGE_GCS_APPLICATION_CREDENTIALS valueFrom: configMapKeyRef: key: GOOGLE_APPLICATION_CREDENTIALS name: airbyte-airbyte-env
That fixes the worker pod issue, however then I ran into the following with
replication orchestrator
. This is not seen until you try to sync a connection.32203
global.gcs.extraEnv
doesn't affect the templates.
Using the below works (for GCS) since the values should likely already be in your configMap if you specified them in global.gcs:
extraEnv: - name: STATE_STORAGE_GCS_BUCKET_NAME valueFrom: configMapKeyRef: key: GCS_LOG_BUCKET name: airbyte-airbyte-env - name: STATE_STORAGE_GCS_APPLICATION_CREDENTIALS valueFrom: configMapKeyRef: key: GOOGLE_APPLICATION_CREDENTIALS name: airbyte-airbyte-env
That fixes the worker pod issue, however then I ran into the following with
replication orchestrator
. This is not seen until you try to sync a connection.32203
global.gcs.extraEnv
doesn't affect the templates.
What I wrote is specific to the worker key in values:
worker.extraEnv
Is there a way to make it work with IRSA authentication (as + iam role)?
Hello all 👋 sorry the missing update here. I shared this with the engineering team and any update return here.
Just to note that this appears to be the same solution to remediate #18016.
this work airbyte/worker:0.50.47
and helm chart 0.53.52
minio:
enabled: false
worker:
extraEnv:
- name: STATE_STORAGE_S3_BUCKET_NAME
value: "XXYYZZ"
- name: STATE_STORAGE_S3_REGION
value: "eu-west-3"
- name: S3_MINIO_ENDPOINT
value: ""
global:
log4jConfig: "log4j2-no-minio.xml"
state:
storage:
type: "S3"
logs:
storage:
type: "S3"
minio:
enabled: false
s3:
enabled: true
bucket: "XXYYZZ"
bucketRegion: "eu-west-3"
accessKey:
existingSecret: "airbyte-aws-creds"
existingSecretKey: "AWS_ACCESS_KEY_ID"
secretKey:
existingSecret: "airbyte-aws-creds"
existingSecretKey: "AWS_SECRET_ACCESS_KEY"
I can confirm, the settings from @raphaelauv helped me to start the worker
pods again.
I'm using helm chart 0.53.120
with airbyte/server:0.50.48
The server
pod runs, but had error messages like in the worker logs.
Adding the following in the in my yml helped to mitigate this:
server:
extraEnv:
- name: LOG4J_CONFIGURATION_FILE
valueFrom:
configMapKeyRef:
name: airbyte-env
key: LOG4J_CONFIGURATION_FILE
(Duplicate comment as previous issue is closed)
I've been pinning version 0.49.6 to get around this for the past month and a half.
(Running Airbyte OSS on AWS EKS cluster, default values.yaml for ease of replication while trying to fix.)
Trying the fix suggested by @marcosmarxm doesn't fix for me. After attempting upgrading from 0.49.6 -> latest since mid Jan (so 0.50.22+) it has never fixed the issue.
Running the minio config in bash returns:
helm % kubectl exec -it airbyte-minio-0 bash -n default
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
bash-5.1# mc alias set myminio http://localhost:9000 minio minio123
mc: Configuration written to `/tmp/.mc/config.json`. Please update your access credentials.
mc: Successfully created `/tmp/.mc/share`.
mc: Initialized share uploads `/tmp/.mc/share/uploads.json` file.
mc: Initialized share downloads `/tmp/.mc/share/downloads.json` file.
Added `myminio` successfully.
bash-5.1# mc mb myminio/state-storage
mc: <ERROR> Unable to make bucket `myminio/state-storage`. Your previous request to create the named bucket succeeded and you already own it.
Not an expert in any of this at all, but it looks like the creation of the bucket isn't entirely the issue. Just wanted to provide additional info as this has been a long-open issue!
Edited to add:
Force removing the bucket seems to (on 0.54.15) point to the bucket being forcefully recreated almost instantaneously.
bash-5.1# mc rb myminio/state-storage
mc: <ERROR> `myminio/state-storage` is not empty. Retry this command with ‘--force’ flag if you want to remove `myminio/state-storage` and all its contents
bash-5.1# mc rb myminio/state-storage --force
Removed `myminio/state-storage` successfully.
bash-5.1# mc mb myminio/state-storage
mc: <ERROR> Unable to make bucket `myminio/state-storage`. Your previous request to create the named bucket succeeded and you already own it.
bash-5.1# mc rb myminio/state-storage --force
Removed `myminio/state-storage` successfully.
bash-5.1# mc rb myminio/state-storage --force
Removed `myminio/state-storage` successfully.
Edit again:
This only occurs with the PostgreSQL source connection. Our S3->S3 jobs can run as expected in versions beyond 0.49.6.
What method are you using to run Airbyte?
Kubernetes
Platform Version or Helm Chart Version
helm 0.49.9
What step the error happened?
Upgrading the Platform or Helm Chart
Revelant information
when upgrading helm from 0.49.6 to 0.49.8/0.49.9, the worker pod keeps crashing. but if I reverted it back to 0.49.6, it's fine.
Relevant log output