airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.4k stars 3.97k forks source link

Destination ClickHouse: out of memory with Incremental - Deduped History model #15352

Open yuxh opened 2 years ago

yuxh commented 2 years ago

Environment

Current Behavior

Tell us what happens. "DB::Exception: Memory limit (for query) exceeded: .." , details in the log below. It seems too many memories (even 32 GB not enough) needed to generate _scd, although the data in mongodb has only millions of records.

Expected Behavior

Tell us what should happen. Any chance to reduce memory, like generate in batch?

Logs

logs-21.txt

IzioDev commented 2 years ago

Potentially related issue : #9553

zczhuohuo commented 2 years ago

I came across the same issue too. Clickhouse normalization consume way too much memory.