Open agavra opened 1 year ago
we might be able to create our own RelDistributionImpl class and solve it that way.
There seems to be some development that can work around this (for example https://github.com/apache/pinot/pull/11630/ only deal with single column key)
I will follow up and see if there's better options
Background: imagine we have the two following table schemas:
and I want to issue
SELECT * FROM A JOIN B ON A.colA = B.col2 AND A.colB = B.col1
. In this case, we will need to hash shuffle the rows to match partitions. In Calcite, we generate a hash distribution with the same join keys:When we do this, Calcite will order the join keys passed into
RelDistributions.hash
to be in ascending order, even if we passed them in with a particular ordering. In the example above, we would have called:But calcite reorders them to both be
[1, 2]
.This causes a problem when we hash our keys. Imagine we had the following rows:
If we simply use the column ordering that calcite provides for the hash exchange, we may not hash these two rows into the same partition (
hash(1, "foo")
andhash("foo", 1)
produce different results).9996 provides a workaround for the solution by using a hash code algorithm that intentionally generates collisions independent of the ordering of the columns (using hash code addition).
This is likely an acceptable solution in the long run so long as the distribution of the hash codes is semi-random even after addition (some initial experimentations show that it is). A better solution, however, would be to fix calcite to maintain the join key ordering so that we can just hash the exact keys.