Open marcosmarxm opened 2 years ago
Comment made from Zendesk by Marcos Marx on 2022-10-10 at 15:02:
Hello there! You are receiving this message because none of your fellow community members has stepped in to respond to your topic post. (If you are a community member and you are reading this response, feel free to jump in if you have the answer!) As a result, the Community Assistance Team has been made aware of this topic and will be investigating and responding as quickly as possible.
Some important considerations that will help your to get your issue solved faster:
* It is best to use our topic creation template; if you haven’t yet, we recommend posting a followup with the requested information. With that information the team will be able to more quickly search for similar issues with connectors and the platform and troubleshoot more quickly your specific question or problem.
* Make sure to upload the complete log file; a common investigation roadblock is that sometimes the error for the issue happens well before the problem is surfaced to the user, and so having the tail of the log is less useful than having the whole log to scan through.
* Be as descriptive and specific as possible; when investigating it is extremely valuable to know what steps were taken to encounter the issue, what version of connector / platform / Java / Python / docker / k8s was used, etc. The more context supplied, the quicker the investigation can start on your topic and the faster we can drive towards an answer.
* We in the Community Assistance Team are glad you’ve made yourself part of our community, and we’ll do our best to answer your questions and resolve the problems as quickly as possible. Expect to hear from a specific team member as soon as possible.
Thank you for your time and attention.
Best,
The Community Assistance Team
Comment made from Zendesk by Marcos Marx on 2022-10-12 at 14:40:
+1
I’m getting the exact same error with a very similar deployment on AWS EKS, using custom S3 logging. This was working with our previous version of Airbyte,
[Discourse post]v0.38.3-alpha
, and started failing after the upgrade tov0.40.14
A resolution here that does not involve reverting back to MINIO would be well appreciated.
Comment made from Zendesk by Sunny on 2022-10-14 at 21:05:
Hey there, I previously created an issue requesting better documentation on these config options, please add a thumbs up and comment any other info you'd like to add: https://github.com/airbytehq/airbyte-internal-issues/issues/1007
Are you deploying with helm or kustomize?
I am seeing here thatS3
should be an acceptable config option for WORKER_STATE_STORAGE_TYPE:
https://github.com/airbytehq/airbyte/blob/master/airbyte-workers/src/main/java/io/airbyte/workers/config/CloudStorageBeanFactory.java#L84
Created https://github.com/airbytehq/airbyte/issues/18016 to track this issue
Comment made from Zendesk by Sunny on 2022-10-14 at 21:20:
Hello, I see that there has been some updates.
Please check to make sure you have these envs filled out (example in Helm): https://github.com/airbytehq/airbyte/blob/master/charts/airbyte/values.yaml#L23
state:
## state.storage.type Determines which state storage will be utilized. One of "MINIO", "S3" or "GCS"
storage:
type: "S3"
logs:
## logs.accessKey.password Logs Access Key
## logs.accessKey.existingSecret
## logs.accessKey.existingSecretKey
accessKey:
password: ""
existingSecret: ""
existingSecretKey: ""
## logs.secretKey.password Logs Secret Key
## logs.secretKey.existingSecret
## logs.secretKey.existingSecretKey
secretKey:
password: ""
existingSecret: ""
existingSecretKey: ""
## logs.storage.type Determines which log storage will be utilized. One of "MINIO", "S3" or "GCS"
## Used in conjunction with logs.minio.*, logs.s3.* or logs.gcs.*
storage:
type: "s3"
## logs.minio.enabled Switch to enable or disable the Minio helm chart
minio:
enabled: false
## logs.externalMinio.enabled Switch to enable or disable an external Minio instance
## logs.externalMinio.host External Minio Host
## logs.externalMinio.port External Minio Port
## logs.externalMinio.endpoint Fully qualified hostname for s3-compatible storage
externalMinio:
enabled: false
host: localhost
port: 9000
## logs.s3.enabled Switch to enable or disable custom S3 Log location
## logs.s3.bucket Bucket name where logs should be stored
## logs.s3.bucketRegion Region of the bucket (must be empty if using minio)
s3:
enabled: false
bucket: airbyte-dev-logs
bucketRegion: ""
Comment made from Zendesk by Marcos Marx on 2022-10-24 at 18:44:
Are you deploying with helm or kustomize?
We are deploying it with kustomize - I provided link above to the airbyte documentation which discusses the environment variables in context of the kustomize config files. Unfortunately, the example with Helm variables would not apply to us. Could you provide an example with kustomize configs here airbyte/kube/overlays/stable at master · airbytehq/airbyte · GitHub. Thank you!
Comment made from Zendesk by Marcos Marx on 2022-11-05 at 19:19:
I’m in the same situation and would like to know how to get this working with kustomize.
[Discourse post]
Comment made from Zendesk by Marcos Marx on 2022-11-09 at 23:36:
Hey all!
I was having the same issue. I’m using helm and am not super familiar with kustomize, but hopefully this helps. I had to set a couple more values in my
values.yaml
file to get it to work.global: # ... logs: accessKey: password: <access_key_id> # Downstream charts don't use the secret created by the password above, so we need to pass in the secret info ourselves existingSecret: <helm_release_name>-airbyte-secrets existingSecretKey: AWS_ACCESS_KEY_ID secretKey: password: <secret_access_key> # Downstream charts don't use the secret created by the password above, so we need to pass in the secret info ourselves existingSecret: <helm_release_name>-airbyte-secrets existingSecretKey: AWS_SECRET_ACCESS_KEY
Dug in to the code and found out that basically the airbyte-worker and airbyte-server
deployment.yaml
files only set theAWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
environment variables if theexistingSecret
andexistingSecretKey
are set, or ifminio
orexternalMinio
is enabled. There’s nothing there if I’m just passing in the password myself.For your situation, I assume the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY aren’t being set properly on the worker/server for some reason. Hope that helps!
[Discourse post]
Yes, logging to S3 doesn't work in new Airbyte 0.40.xx versions up to now
Comment made from Zendesk by Marcos Marx on 2022-12-01 at 08:33:
We’re also attempting to upgrade to 0.40.22 with
kustomize
and run into the exact same problem with theworker
as stated here. We’ve been using S3 for logging instead of Minio as well.What should be the course of action here? Stay stuck to a version before the
[Discourse post]WORKER_*
vars were introduced like 0.40.6?
I am trying to setup airbyte in PROD with latest version of helm version: 0.42.0 and with Kube both are failing for the above reason and neither the server nor the workers are spinning up. I have spend more than a week now to getting fixed with different options, its of no use.
Comment made from Zendesk by Marcos Marx on 2022-12-06 at 20:21:
Hello Oleg Gusak, it's been a while without an update from us. Are you still having problems or did you find a solution?
Comment made from Zendesk by Marcos Marx on 2022-12-22 at 05:04:
I am also stuck on this same problem.
[Discourse post]
Is there any update on any solution?
Comment made from Zendesk by Marcos Marx on 2023-01-06 at 17:21:
This is still an issue with using kustomization overlays for version 0.40.26. Oddly, the helm chart works correctly (for this, there are other things that are broken which is why I’m trying kustomization) so there’s probably a workaround.
[Discourse post]
Comment made from Zendesk by Marcos Marx on 2023-01-06 at 19:23:
I have confirmed a workaround to get this fixed in version 0.40.26. In order to configure S3 logs correctly using the kustomization overlays, you need to follow the instructions found here as well as set
[Discourse post]WORKER_LOGS_STORAGE_TYPE=S3
. Note thatWORKER_STATE_STORAGE_TYPE
needs to remain unchanged.
Comment made from Zendesk by Marcos Marx on 2023-01-11 at 15:14:
We are using Kustomize and our airbyte version is 0.40.23. The issue we are seeing is that we failed to set custom s3 as a state storage bucket. The workaround right now is to turn on
mino
back just for state information.I put a
[Discourse post][fix](https://github.com/airbytehq/airbyte/pull/19486)
before based on the limited knowledge I have.
Comment made from Zendesk by Marcos Marx on 2023-01-23 at 21:51:
Hi - I’m still having some trouble with this and wondered if you could confirm your set up
Env overlay
S3_LOG_BUCKET=<your_s3_bucket_to_write_logs_in> S3_LOG_BUCKET_REGION=<your_s3_bucket_region> # Set this to empty. S3_MINIO_ENDPOINT= # Set this to empty. S3_PATH_STYLE_ACCESS= WORKER_LOGS_STORAGE_TYPE=S3 # leave as is, for me, defaults to MINIO # WORKER_STATE_STORAGE_TYPE=
Secrets overlay
AWS_ACCESS_KEY_ID=<your_aws_access_key_id> AWS_SECRET_ACCESS_KEY=<your_aws_secret_access_key>
And that’s it? I’ve tried this on
[Discourse post]v0.40.28
andv0.40.26
but I’m still getting the same issue as the original post.
Comment made from Zendesk by Marcos Marx on 2023-02-01 at 01:06:
Thanks @rcheatham-q - your suggestion to set vars as
WORKER_LOGS_STORAGE_TYPE=S3 WORKER_STATE_STORAGE_TYPE=MINIO
worked for us too.
[Discourse post]
Comment made from Zendesk by Marcos Marx on 2023-02-06 at 12:36:
anyone has figured out this for GCS logs?
[Discourse post]
I’m not convinced that I should put minio related values if I have only gcs logs activated
Comment made from Zendesk by Marcos Marx on 2023-02-06 at 15:10:
Yes; we encountered a similar problem with GCS.
These configuration changes solved the issue for us (note that we are using the k8s manifests directly, not the helm chart):
- In .env, the env var
GCS_LOG_BUCKET
needs to be set to the log bucket and the additional variable calledSTATE_STORAGE_GCS_BUCKET_NAME
needs to be set to the state storage bucket. As far as I can tell,STATE_STORAGE_GCS_BUCKET_NAME
isn’t documented, but you can see that it is part of the GCS configuration block for the workers: airbyte/application.yml at 7676af5f5fb53542ebaff18a415f9c89db417055 · airbytehq/airbyte · GitHub . The Minio/S3 variables for us are mostly nulled out, so the config variables for logs and storage largely look like so:# S3/Minio Log Configuration S3_LOG_BUCKET= S3_LOG_BUCKET_REGION= S3_MINIO_ENDPOINT= S3_PATH_STYLE_ACCESS= # GCS Log Configuration GCS_LOG_BUCKET=<log bucket> STATE_STORAGE_GCS_BUCKET_NAME=<state bucket> # State Storage Configuration STATE_STORAGE_MINIO_BUCKET_NAME= STATE_STORAGE_MINIO_ENDPOINT= # Cloud Storage Configuration WORKER_LOGS_STORAGE_TYPE=gcs WORKER_STATE_STORAGE_TYPE=gcs
- Secondly, the manifests for the workers need to be modified to actually pass the GCS state bucket variables as they currently do not. In the airbyte-worker deployment (airbyte/worker.yaml at master · airbytehq/airbyte · GitHub), we added the following vars (note that GOOGLE_APPLICATION_CREDENTIALS are reused here, but it is probably better to have a separate SA credentials for writing state):
- name: STATE_STORAGE_GCS_BUCKET_NAME valueFrom: configMapKeyRef: name: airbyte-env key: STATE_STORAGE_GCS_BUCKET_NAME - name: STATE_STORAGE_GCS_APPLICATION_CREDENTIALS valueFrom: secretKeyRef: name: airbyte-secrets key: GOOGLE_APPLICATION_CREDENTIALS
Hope this helps.
[Discourse post]
Comment made from Zendesk by Marcos Marx on 2023-02-07 at 07:45:
kzvezdarov:- name: STATE_STORAGE_GCS_BUCKET_NAME valueFrom: configMapKeyRef: name: airbyte-env key: STATE_STORAGE_GCS_BUCKET_NAME - name: STATE_STORAGE_GCS_APPLICATION_CREDENTIALS valueFrom: secretKeyRef: name: airbyte-secrets key: GOOGLE_APPLICATION_CREDENTIALS
thanks a lot ! it works !
Comment made from Zendesk by Marcos Marx on 2023-02-16 at 06:14:
Hey, you work with which version of Airbyte?
[Discourse post]
have you tested 0.40.30+ and had issues with GCS logging?
c.f. Cloud Storage Configs are null for GCS logs storage type - airbytehq/airbyte#2 by marcosmarxm
Comment made from Zendesk by Marcos Marx on 2023-02-16 at 14:34:
Currently on 0.40.30, though the version is a bit irrelevant in my case - I use the manifests defined here: airbyte/kube at master · airbytehq/airbyte · GitHub, with the modification to worker.yaml from my post above. I never had an issue with GCS logging, but an issue with workers writing state to GCS - because the state bucket and creds are not getting passed in the worker deployment configs.
As far as I can tell, as of the latest commit on
[Discourse post]master
, the manifests still have that issue and require the worker modification posted above for the deployment to function on GCP.
Comment made from Zendesk by Marcos Marx on 2023-04-03 at 23:14:
Closed due to no response from requester.
Comment made from Zendesk by Marcos Marx on 2023-04-03 at 23:15:
Closed due to no response from requester.
Comment made from Zendesk by Marcos Marx on 2023-04-03 at 23:16:
Closed due to no response from requester.
Comment made from Zendesk by Marcos Marx on 2023-04-03 at 23:20:
Closed due to no response from requester.
Comment made from Zendesk by Marcos Marx on 2023-04-03 at 23:21:
Closed due to no response from requester.
Comment made from Zendesk by Marcos Marx on 2023-04-03 at 23:21:
Closed due to no response from requester.
Comment made from Zendesk by Marcos Marx on 2023-04-03 at 23:23:
Closed due to no response from requester.
When I deploy the helm chart I am able to bring up all the pods and services but when I try creating any source connection in Airbyte UI, I see below error in the worker and server pods and connection setup fails with bad response error
Collecting content into /tmp/toBePublished6209292906792156831.tmp before uploading. Cannot start publish with com.van.logging.aws.S3PublishHelper@703961dc due to error: Cannot start publishing: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: null; S3 Extended Request ID: null; Proxy: null) Publishing to S3 (bucket=sample-bucket; key=job-logging/workspace/66cd15e7-0f0c-4a7c-89be-79863ba48b6b/0/logs.log/20230427075426_int-qa-airbyte-worker-59ff685c8f-s76xt_dcce21db15244bf8a191652ceb5710f5): java.lang.RuntimeException: Cannot publish to S3: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: null; S3 Extended Request ID: null; Proxy: null)
Also I have tried using S3 bucket instead of external minio, in that case the worker and server pods are not coming up and could see below error in the logs
2023-05-06 14:23:00 ERROR i.m.r.Micronaut(handleStartupException):338 - Error starting Micronaut server: null java.lang.IllegalArgumentException: null at com.google.common.base.Preconditions.checkArgument(Preconditions.java:131) ~[guava-31.1-jre.jar:?] at io.airbyte.config.storage.DefaultS3ClientFactory.validateBase(DefaultS3ClientFactory.java:36) ~[io.airbyte.airbyte-config-config-models-0.44.0.jar:?] at io.airbyte.config.storage.DefaultS3ClientFactory.validate(DefaultS3ClientFactory.java:31) ~[io.airbyte.airbyte-config-config-models-0.44.0.jar:?] at io.airbyte.config.storage.DefaultS3ClientFactory.<init>(DefaultS3ClientFactory.java:24) ~[io.airbyte.airbyte-config-config-models-0.44.0.jar:?] at io.airbyte.config.helpers.CloudLogs.createCloudLogClient(CloudLogs.java:51) ~[io.airbyte.airbyte-config-config-models-0.44.0.jar:?] at io.airbyte.config.helpers.LogClientSingleton.createCloudClientIfNull(LogClientSingleton.java:226) ~[io.airbyte.airbyte-config-config-models-0.44.0.jar:?] at io.airbyte.config.helpers.LogClientSingleton.setWorkspaceMdc(LogClientSingleton.java:213) ~[io.airbyte.airbyte-config-config-models-0.44.0.jar:?] at io.airbyte.server.LoggingEventListener.onApplicationEvent(LoggingEventListener.java:34) ~[io.airbyte-airbyte-server-0.44.0.jar:?] at io.airbyte.server.LoggingEventListener.onApplicationEvent(LoggingEventListener.java:21) ~[io.airbyte-airbyte-server-0.44.0.jar:?] at io.micronaut.context.event.ApplicationEventPublisherFactory.notifyEventListeners(ApplicationEventPublisherFactory.java:262) ~[micronaut-inject-3.8.8.jar:3.8.8] at io.micronaut.context.event.ApplicationEventPublisherFactory.access$200(ApplicationEventPublisherFactory.java:60) ~[micronaut-inject-3.8.8.jar:3.8.8] at io.micronaut.context.event.ApplicationEventPublisherFactory$2.publishEvent(ApplicationEventPublisherFactory.java:229) ~[micronaut-inject-3.8.8.jar:3.8.8] at io.micronaut.http.server.netty.NettyHttpServer.lambda$fireStartupEvents$15(NettyHttpServer.java:587) ~[micronaut-http-server-netty-3.8.8.jar:3.8.8] at java.util.Optional.ifPresent(Optional.java:178) ~[?:?] at io.micronaut.http.server.netty.NettyHttpServer.fireStartupEvents(NettyHttpServer.java:581) ~[micronaut-http-server-netty-3.8.8.jar:3.8.8] at io.micronaut.http.server.netty.NettyHttpServer.start(NettyHttpServer.java:298) ~[micronaut-http-server-netty-3.8.8.jar:3.8.8] at io.micronaut.http.server.netty.NettyHttpServer.start(NettyHttpServer.java:104) ~[micronaut-http-server-netty-3.8.8.jar:3.8.8] at io.micronaut.runtime.Micronaut.lambda$start$2(Micronaut.java:81) ~[micronaut-context-3.8.8.jar:3.8.8] at java.util.Optional.ifPresent(Optional.java:178) ~[?:?] at io.micronaut.runtime.Micronaut.start(Micronaut.java:79) ~[micronaut-context-3.8.8.jar:3.8.8] at io.micronaut.runtime.Micronaut.run(Micronaut.java:323) ~[micronaut-context-3.8.8.jar:3.8.8] at io.micronaut.runtime.Micronaut.run(Micronaut.java:309) ~[micronaut-context-3.8.8.jar:3.8.8] at io.airbyte.server.Application.main(Application.java:15) ~[io.airbyte-airbyte-server-0.44.0.jar:?]
@amankesharwani7 getting the same error pattern with s3 bucket is there any tips to launch it with s3 logging instead of external minio?
Anyone has any suggestion, please comment here, this is a blocker for quite a long time and I could not find any solution to resolve the issue
hey, not sure if that is relevant, but I resolved very similar issue with defining such env variables manually:
STATE_STORAGE_S3_BUCKET_NAME: "xxx-yyy"
STATE_STORAGE_S3_REGION: "eu-central-1"
As we are using Helm installation type, this is how it looks in configuration:
spec:
chart:
spec:
version: "0.50.18"
# Default values
# https://github.com/airbytehq/airbyte-platform/blob/main/charts/airbyte/values.yaml
values:
global:
logs:
s3:
bucket: xxx-yyy
bucketRegion: "eu-central-1"
env_vars:
STATE_STORAGE_S3_BUCKET_NAME: "xxx-yyy"
STATE_STORAGE_S3_REGION: "eu-central-1"
I have found these variables in the code: https://github.com/airbytehq/airbyte-platform/blob/main/airbyte-config/config-models/src/main/java/io/airbyte/config/EnvConfigs.java#L262-L300
Thanks for this @gediminas-puksmys-nfq . Just happened to try an upgrade today and immediately ran into this. AFAIK, the worker
sub-chart is the only chart to need these redundant ENV vars (provided via worker.extraEnv
), but they should be bucket-brigaded via gloval.logs.s3
values.
Just to note that this appears to be the same solution to remediate airbytehq/airbyte#31988.
Hi,
I'm facing the same issue with chart version - 0.64.52 on AWS EKS. Is there any update on any solution?
This Github issue is synchronized with Zendesk:
Ticket ID: #2668 Priority: normal Group: Community Assistance Engineer Assignee: Sunny
Original ticket description: