apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.23k stars 2.39k forks source link

[SUPPORT] Insert/Upsert in 0.10.1 is slow compared to 0.8.0 #5980

Open bkosuru opened 2 years ago

bkosuru commented 2 years ago

Hello,

Trying to upgrade hdfs based table to hudi 0.10.1 from 0.8.0. We cannot upgrade to 0.11.1 yet. Noticed a big performance hit with Insert/Upsert.

Insert in 0.8.0 - 5.7 min Insert in 0.10.1 - 12 min

Upsert in 0.8.0 - 27min Upsert in 0.10.1 - 66min

The writer config we use is here - https://github.com/apache/hudi/issues/5741 Is this is a known issue? Is there any additional setting we need to use in 0.10.1?

Here is the screenshots for Upsert:

Screen Shot 2022-06-26 at 9 56 52 PM

Screen Shot 2022-06-26 at 9 56 34 PM

Thanks, Bindu

danny0405 commented 2 years ago

cc @nsivabalan , i have some impression that we have fixed the performance regression, do you remember which patch ?

bkosuru commented 2 years ago

Insert in 0.8.0 - 5.7 min Insert in 0.10.1 - 12 min Insert in 0.11.1 - 5.7 min

Upsert in 0.8.0 - 27min Upsert in 0.10.1 - 66min Upsert in 0.11.1 - 42.5min

Using .option("hoodie.metadata.index.bloom.filter.enable", "true") .option("hoodie.metadata.index.column.stats.enable", "true") .option("hoodie.index.type", "BLOOM")

yihua commented 2 years ago

@bkosuru for 0.11.1, could you turn off column stats and bloom filter in metadata table and see if that helps bring the write latency on par?

bkosuru commented 2 years ago

@yihua I will test when I get a chance. But since insert is performing well in 0.11.1 we will upgrade to 0.11.1 upsert usecase is rare for us. You can change the priority to minor. Thanks

qjqqyy commented 2 years ago

cc @nsivabalan , i have some impression that we have fixed the performance regression, do you remember which patch ?

seems to be related to #4012, which is fixed in 0.11.0

nsivabalan commented 1 year ago

yes, we have made quite few fixes around perf in 0.12. Can you wait for couple of days and give 0.12 a try. Highly recommend if you are looking for better performance.

nsivabalan commented 1 year ago

@bkosuru : we have made lot of fixes around perf in 0.12 on both read and write side. can you try 0.12 and let us know what you see. please disable bloom filter and column stats. Try w/ and w/ enabling metadata as well. curious to know how this fares.

xushiyan commented 1 year ago

@bkosuru : we have made lot of fixes around perf in 0.12 on both read and write side. can you try 0.12 and let us know what you see. please disable bloom filter and column stats. Try w/ and w/ enabling metadata as well. curious to know how this fares.

@bkosuru would you mind re-do the benchmark using 0.12.1 ? would like to verify if perf gaps are resolved.

bkosuru commented 1 year ago

This will take me a while to setup and test. I will do it when time permits. Thanks!

ad1happy2go commented 1 year ago

@bkosuru Did updating to latest version improved the performance? Do you still need help on this?