Open rui-mo opened 1 month ago
@mbasmanova Would you like to give us some inputs here? Thanks.
cc: @FelixYBW @zhouyuan @zml1206
@pedroerp
setProbedFlag
is a slow operation due to its random memory access pattern. To accelerate this use case, we could move them to a separate bit mask array in hash table. @oerling @mbasmanova @xiaoxmeng What do you think?
I wonder if it would be faster to set 'probed' flag during probing. I assume we load the 'row' anyway to compare join keys, so we already have the data in memory. This can help at least for right / full joins without the extra filter.
@mbasmanova This will work for non array hash mode. For array hash mode (which is probably the case here) we still need to find another way to accelerate it.
Description
In TPC-DS q14a/q14b, there is a left semi join with a right build side, where the data size of right table is 10G and the left is 3M. Given that
a LEFT b == b RIGHT a
, we switched to using a right semi join in Velox to build on the smaller table in order to lessen the hash build's memory pressure. However, with this change, we observe a regression in performance from 30s to 300s. This leads us to think about the performance of right semi join.In https://github.com/rui-mo/velox/commit/e05691a6648ebd5a540ef3b3baae8868a48b38fd, we added several tests using various build types but the same left and right tables. Here is the performance data we obtained. These data show that the left and right outer joins perform similarly, but the right semi join is much slower than the left semi join.
We gathered the right semi join performance record, which indicates that
setProbedFlag
is a hotspot.