airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.52k stars 4k forks source link

[orchestrator-repl] High orchestrator CPU during slow sync #37837

Open stephen-up opened 4 months ago

stephen-up commented 4 months ago

Helm Chart Version

0.64.308

What step the error happened?

During the Sync

Relevant information

Hi, Im working with some large slow Mixpanel syncs where the orchestrator pods consumes lots of CPU. It looks like the majority of the CPU time is spent on a lock in BufferedReplicationWorker. It looks like its just waiting for messages from the Mixpanel source. The source itself is very slow, but while it waits the orchestrator ideally shouldnt need to consume so much CPU.

If I take some stack dumps I can see lots of CPU time in BufferedReplicationWorker.

image

The CPU stays high for most of the time of the sync, Hours in this case.

image

Is there some way the orchestrator could wait for messages from the source without consuming so much CPU?

Thanks

Relevant log output

No response

marcosmarxm commented 4 months ago

@stephen-up, apart from high CPU usage, did you notice any other issues? Does the sync complete eventually?

stephen-up commented 4 months ago

Hi @marcosmarxm. Yea it works eventually.

Mixpanel has a very slow api rate limit, so the source is backing off and sleeping a lot. Hence the orchestrator doesn't seem to be doing useful work with all of the CPU its consuming. It looks like its spending all of its time in BufferedReplicationWorker calling !messagesForDestinationQueue.isDone()

marcosmarxm commented 4 months ago

@stephen-up this is expected now. The team is working to release a fix in future releases.

stephen-up commented 4 months ago

Thanks.

damour commented 2 months ago

The same issue for [google ads/bing/zendesk] -> [bigquery] connections. It would be nice to be able to configure some sleep() interval in order to reduce CPU usage.