airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.33k stars 4.16k forks source link

Airbyte not responding via UI or API #46554

Open henriquemeloo opened 1 month ago

henriquemeloo commented 1 month ago

What happened?

Related issue: https://github.com/airbytehq/airbyte/issues/44833

Airbyte does not respond via UI or via the API.

Temporal logs show errors and warnings like:

{"level":"info","ts":"2024-10-07T13:29:15.145Z","msg":"matching client encountered error","service":"frontend","error":"service rate limit exceeded","service-error-type":"serviceerror.ResourceExhausted","logging-call-at":"metric_client.go:219"}
{"level":"info","ts":"2024-10-07T13:29:15.153Z","msg":"history client encountered error","service":"frontend","error":"service rate limit exceeded","service-error-type":"serviceerror.ResourceExhausted","logging-call-at":"metric_client.go:104"}
{"level":"warn","ts":"2024-10-07T13:30:49.145Z","msg":"Per shard per namespace RPS warn limit exceeded","service":"history","shard-id":2,"wf-namespace":"default","rps":65,"logging-call-at":"health_signal_aggregator.go:171"}
{"level":"warn","ts":"2024-10-07T13:30:49.145Z","msg":"Per shard RPS warn limit exceeded","service":"history","shard-id":2,"rps":65,"logging-call-at":"health_signal_aggregator.go:178"}
{"level":"warn","ts":"2024-10-07T13:30:49.146Z","msg":"Per shard per namespace RPS warn limit exceeded","service":"history","shard-id":3,"wf-namespace":"default","rps":115,"logging-call-at":"health_signal_aggregator.go:171"}
{"level":"warn","ts":"2024-10-07T13:31:06.245Z","msg":"Unspecified task queue kind","service":"frontend","wf-task-queue-name":"GET_SPEC","wf-namespace":"default","logging-call-at":"workflow_handler.go:3772"}

We had this problem on a Docker deployment with Airbyte v0.54.0, and after upgrading to Airbyte v1.1.0 with abctl this still happens.

What did you expect to happen?

Airbyte to be functional in this version, with this deployment.

Abctl Version

```console $ abctl version version: v0.18.0 ```

Docker Version

```console $ docker version Client: Docker Engine - Community Version: 27.3.1 API version: 1.47 Go version: go1.22.7 Git commit: ce12230 Built: Fri Sep 20 11:41:00 2024 OS/Arch: linux/amd64 Context: default Server: Docker Engine - Community Engine: Version: 27.3.1 API version: 1.47 (minimum version 1.24) Go version: go1.22.7 Git commit: 41ca978 Built: Fri Sep 20 11:41:00 2024 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.7.22 GitCommit: 7f7fdf5fed64eb6a7caf99b3e12efcf9d60e311c runc: Version: 1.1.14 GitCommit: v1.1.14-0-g2c9f560 docker-init: Version: 0.19.0 GitCommit: de40ad0 ```

OS Version

```console $ cat /etc/os-release PRETTY_NAME="Ubuntu 22.04.2 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.2 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=jammy ```
henriquemeloo commented 1 month ago

Should I have deleted Temporal databases before the migration? They have grown a bit large.

bgroff commented 1 month ago

Possibly, what size instance are you using?

henriquemeloo commented 1 month ago

@bgroff Airbyte is running with abctl on a c5a.4xlarge EC2 instance, with external database on a db.t4g.micro RDS instance. The database instance CPU utilization doesn't seem to have peaked above 35% though.

bgroff commented 1 month ago

That should be plenty :). I don't think the Temporal db should be breaking the API or UI. Can you try and load the web page with the network tab opened in the developer tools? I am curious if all requests are blocked or if a specific request is getting hung up.

henriquemeloo commented 1 month ago

@bgroff the pending request seems to be /api/v1/workspaces/get I'm not sure it has to do with the Temporal database. I mentioned it because deleting them seems to be a workaround to the "service rate limit exceeded" error...

henriquemeloo commented 3 weeks ago

Does increasing the number of Temporal replicas help with this? If not, is there an easy way to fine tune the values in Temporal's dynamicconfig/development.yaml file? Do we have to customize Airbyte's Helm chart?