airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.08k stars 4.12k forks source link

Destination PubSub - Large event stream from Amplitude Source makes Google Pubsub destination pod worker go OOM #19966

Open coadan opened 1 year ago

coadan commented 1 year ago

Environment

Current Behavior

Large event stream from Amplitude Source makes Google Pubsub destination pod worker go OOM

Expected Behavior

The connector should periodically flush messages from memory before reaching the pods memory limit

Logs

Section of log which includes said error: 2022-11-30 16:06:19 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):78 - Terminating due to java.lang.OutOfMemoryError: Java heap space

logs_16104_txt.txt

Steps to Reproduce

  1. Set up a large enough source stream from Amplitude
  2. Sync to PubSub
  3. Error appears as the pod reaches its memory limit
marcosmarxm commented 1 year ago

Did you try to increase resources to connector pod?

coadan commented 1 year ago

@marcosmarxm I have been using the same connector resource config, and source to sync to both BigQuery and GCS without issue, so I find it odd that it just struggles on sending to Pubsub because of resource constraints?

midavadim commented 1 year ago

@coadan

According to the description and the fact that the Amplitude connector works fine with another destination, the problem is actually with PubSub destination

So my plan is to change the issue title to be related to PubSub destination.