Open TimothyZhang7 opened 9 months ago
Somehow this problem goes away by deleting temporal
and temporal_visibility
database in the postgres db created by airbyte deployment and restart the instance with run-ab-platform.sh
script.
Not sure if it is definitive but worth a try if you run into the same problem
Similar discussion: https://github.com/airbytehq/airbyte/discussions/30472
We experienced this issue as well with Helm chart version 0.50.20 in multiple environments. Completing these steps resolved it for us:
@marcosmarxm we're seeing this ever since we updated from 0.44.0 to 0.57.1, OSS. The Airbyte installation is unstable and I think this is connected:
airbyte-worker
service went into a reboot loop last FridayWhat might be the downsides of @joeybenamy's approach with deleting the temporal databases?
@marcosmarxm we're seeing this ever since we updated from 0.44.0 to 0.57.1, OSS. The Airbyte installation is unstable and I think this is connected:
- The
airbyte-worker
service went into a reboot loop last Friday- The logs are never rotated and quickly use up all of the disk
What might be the downsides of @joeybenamy's approach with deleting the temporal databases?
After deleting the temporal databases, there is a chance of some running sync jobs getting stuck. More specifically, cannot be run or canceled. AFAIK you will have to reset the connector to fix it.
@TimothyZhang7 thanks! Actually, it went mostly ok. I saw a few log entries about a mismatch for some of the running sync job statuses, but it's been running smoothly ever since.
That said, we still have the same problem with log rotation, it didn't go away.
@marcosmarxm we're seeing this ever since we updated from 0.44.0 to 0.57.1, OSS. The Airbyte installation is unstable and I think this is connected:
- The
airbyte-worker
service went into a reboot loop last Friday- The logs are never rotated and quickly use up all of the disk
What might be the downsides of @joeybenamy's approach with deleting the temporal databases?
After deleting the temporal databases, there is a chance of some running sync jobs getting stuck. More specifically, cannot be run or canceled. AFAIK you will have to reset the connector to fix it.
Yes, I should have mentioned that we don't do maintenance like this in Airbyte without stopping and pausing all syncs.
Hello all 👋 I reported this to the eng team. @joeybenamy are you still experiencing the issue?
Hello all 👋 I reported this to the eng team. @joeybenamy are you still experiencing the issue?
We have not encountered this issue in quite some time. Thanks for checking!
@marcosmarxm What was the final recommendation/solution for fixing this issue? Or will an official solution be included in the next release?
@marcosmarxm I have upgraded to 0.60.0 but, I am still facing rate limit error
I increased some temporal config which i got it from temporal community and reduced number of workers (10 --> 3). Error disappeared
https://community.temporal.io/t/resource-exhausted-namespace-rate-limit-exceeded-for-cron-job/7583
# when modifying, remember to update the docker-compose version of this file in temporal/dynamicconfig/development.yaml
frontend.namespaceCount:
- value: 4096
constraints: {}
frontend.namespaceRPS.visibility:
- value: 100
constraints: {}
frontend.namespaceBurst.visibility:
- value: 150
constraints: {}
frontend.namespaceRPS:
- value: 76800
constraints: {}
@sivankumar86 did you add these values to the ./temporal/dynamicconfig/development.yaml file? When I add these values airbyte fails to start correctly throwing a ton of "Failed to resolve name errors"
After upgrading to 0.60.0, we still encounter this, if its related to number of workers, here is our config: MAX_SYNC_WORKERS=10 MAX_SPEC_WORKERS=10 MAX_CHECK_WORKERS=10 MAX_DISCOVER_WORKERS=10 MAX_NOTIFY_WORKERS=5 SHOULD_RUN_NOTIFY_WORKFLOWS=true
@walker-philips I meant, replicas count. Find my conf file for reference if it helps. Verify using
k describe cm airbyte-oss-temporal-dynamicconfig # airbyte-oss name of deployment
worker:
enabled: true
replicaCount: 3
---
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ include "common.names.fullname" . }}-dynamicconfig
labels:
{{- include "airbyte.labels" . | nindent 4 }}
data:
"development.yaml": |
# when modifying, remember to update the docker-compose version of this file in temporal/dynamicconfig/development.yaml
frontend.namespaceCount:
- value: 4096
constraints: {}
frontend.namespaceRPS.visibility:
- value: 100
constraints: {}
frontend.namespaceBurst.visibility:
- value: 150
constraints: {}
frontend.namespaceRPS:
- value: 76800
constraints: {}
frontend.enableClientVersionCheck:
- value: true
constraints: {}
history.persistenceMaxQPS:
- value: 3000
constraints: {}
frontend.persistenceMaxQPS:
- value: 5000
constraints: {}
frontend.historyMgrNumConns:
- value: 30
constraints: {}
frontend.throttledLogRPS:
- value: 200
constraints: {}
history.historyMgrNumConns:
- value: 50
constraints: {}
system.advancedVisibilityWritingMode:
- value: "off"
constraints: {}
history.defaultActivityRetryPolicy:
- value:
InitialIntervalInSeconds: 1
MaximumIntervalCoefficient: 100.0
BackoffCoefficient: 2.0
MaximumAttempts: 0
history.defaultWorkflowRetryPolicy:
- value:
InitialIntervalInSeconds: 1
MaximumIntervalCoefficient: 100.0
BackoffCoefficient: 2.0
MaximumAttempts: 0
# Limit for responses. This mostly impacts discovery jobs since they have the largest responses.
limit.blobSize.error:
- value: 15728640 # 15MB
constraints: {}
limit.blobSize.warn:
- value: 10485760 # 10MB
constraints: {}
@walker-philips Could you restart the temporal pod after applying changes if you have not done yet ?
@sivankumar86 Could you please explain on how to inject new key value pairs to the dynamicconfig temporal config map via Helm chart? I don't think, it is supported via the Helm chart.
@msenmurugan I download the helm chart and modify it before deploying in ci/cd pipeline .
@marcosmarxm any update on this issue ? we have similar issue each time we upgrade the airbyte version For now, i have to :
Topic
Temporal issue
Revelant information
Airbyte version: 0.50.21
We are observing abnormal amount of rate limit errors from airbyte-cron. We are not using airbyte schedulers, only one cron job is setup on the Airbyte UI.
The following error message is emitted every few seconds as soon as we start the docker compose.