airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.52k stars 3.99k forks source link

Add pagination or lazy loading to the connections UI #27364

Open sh4sh opened 1 year ago

sh4sh commented 1 year ago

What area the feature impact?

Frontend / airbyte-webapp

Revelant Information

When there are a lot of connections set up, loading the full list can take a very long time and cause performance degradation.

It would be awesome if the list of connections could be loaded in chunks, maybe through pagination or lazy loading.

josephkmh commented 1 year ago

@chandlerprall is looking into this!

teallarson commented 1 year ago

Refining note:

nataliekwong commented 10 months ago

@alex-gron Tagging you here since you shared feeling the pain on this. It also came up today during the Compose planning.

nataliekwong commented 10 months ago

I believe there's an effort to migrate some parts of this page to micronaut, which means we can't touch it until that is done? ( please chime in if I'm mistaken!)

I created a query that shows how many other workspaces are facing at least a similarly high issue as our own internal workspace. The two groups shown are how many workspaces have as many or more connections as our internal workspaces or have less connections. I don't have a great alternative way to figure out blast radius other than to use our own workspace as an example.

josephkmh commented 10 months ago

To add a little more data here, I used this query in datadog: service:airbyte-server env:prod operation_name:netty.request resource_name:"POST /api/v1/web_backend/connections/list"

To pull data on loading time by percentiles:

So only about 10% of requests to the connections list endpoint are what I would call "slow" and maybe 1-2% are "super slow". Unfortunately datadog doesn't have data about workspace IDs, so I can't see what percent of actual workspaces are slow, just of total requests.

luiz-cesa commented 7 months ago

Hello friends, any news on this?

chandlerprall commented 7 months ago

Looking at averages, the trend for POST /api/v1/web_backend/connections/list has continued to go down. Datadog's APM traces for that endpoint appear to flag 2 SQL queries that can spike quite a bit and can be looked into. I don't think we get specific workspace IDs from those traces though, which could make them harder to optimize.

cesar-loadsmart commented 7 months ago

Hey folks, any news on this topic?

Airbyte landing page is basically not functional for us.

josephkmh commented 6 months ago

@cesar-loadsmart we don't have this committed on the roadmap currently, but I'm interested to hear more about your setup. Our current understanding is that users suffering from this are outliers, but maybe that's incorrect. Are you on Airbyte Cloud or OSS? How many connections do you have in your workspace where the lack of pagination is slowing the UI down so dramatically?

cesar-loadsmart commented 6 months ago

Hey @josephkmh we run Airbyte Open Source. We have around 102 Active connections, some of them run on short intervals, so our jobs table has ~800K records. Every time we load the Airbyte landing page we can see our Postgres database struggling, even though queries continue to run within a reasonable time in the database.

ericsalesdeandrade commented 5 months ago

We have around 2500 connections and the UI is practically unusable.

It's because we've made a connection for every table in the database as we found it hard to club together streams in a single connection due to the high variability of table size / # of rows, causing sync failures. If anyone has any advice for me that would be greatly appreciated. Thanks a lot :)

pai911 commented 3 months ago

For our company, if we're gonna adopt Airbyte fully, this is a must-fix issue. Per Airbyte's design, it's normal to have a large number of connections.

chandlerprall commented 1 day ago

We have now taken multiple passes at UI performance issues for the connections list including optimizing DOM, improved response to user actions on the page, and a couple days ago enabled table virtualization.

I'm going to let this sit for a few more days and then check metrics again. My expectation is we will see improved performance on the page, with some spikes still coming from API response times, and we can re-asses those endpoints without a ton of extra noise introduced by the UI.