Instead of constructing indexes during the sinking into the create index operator, we now buffer all input and then spawn tasks equal to the amount of threads that construct the index with all the data available in parallel. This means we now parallelize over vectors instead of row groups regardless of how much data we receive, and don't need to resize/reallocate the index multiple times (with extra locking) during construction.
On my machine this gives me an almost 10x performance increase. But there's still a bunch more small optimizations we can do.
Instead of constructing indexes during the sinking into the create index operator, we now buffer all input and then spawn tasks equal to the amount of threads that construct the index with all the data available in parallel. This means we now parallelize over vectors instead of row groups regardless of how much data we receive, and don't need to resize/reallocate the index multiple times (with extra locking) during construction.
On my machine this gives me an almost 10x performance increase. But there's still a bunch more small optimizations we can do.