Closed kagamiori closed 1 year ago
This problem is caused by undefined behavior of std::multiset<MaterializedRow, MaterializedRowComparator> used when comparing the expected and actual results in QueryAssertions.cpp. The undefined behavior is due to MaterializedRowComparator not satisfying "strict weak ordering" required by std::multiset.
@Yuhta proposed one possible solution of using sort-merge to replace the use of std::multiset. First, we use the basic comparison without epsilon to ensure well-defined behavior of sorting. Then during the merge phase, we use epsilon in the comparisons to tolerate imprecision of floating point numbers.
More details about the issue:
... the lessThanWithEpsilon logic in MaterializedRowComparator does not establish a "strict weak ordering" as required by the standard. Consider the following example with 3 vectors (a,b,c) and epsilon (e) equal to 3:
a = {2, 4}
b = {0, 8}
c = {4, 0}
e = 3
a < b < c < a
2 = 0 < 4 = 2
4 < 8 0 < 4
This is not transitive.
Two other failures related to this:
@oerling pointed out that sorting could result in different order when error is added. An example is sorting these 2 tuples:
(0, 0)
(0.0001, 100)
when there is an error of 0.001 on the first column of first row, the result becomes
(0.0001, 100)
(0.001, 0)
And comparison on second column would fail. In this case we probably need to do maximum bipartite matching.
And comparison on second column would fail. In this case we probably need to do maximum bipartite matching.
Hi @Yuhta, Thank you for the explanation! I looked at the algorithms for the maximum bipartite matching problem. The mostly used Fold-Fulkerson and Kuhn-Munkres algorithms have O(VE) time complexity where V is the number of rows in one result and E is V^2 in our problem. Since we usually use 1000 as the batch size in our tests, the complexity would be O(10^9) in the worst case. (The more complex Hopcroft–Karp algorithm has O(V^1/2 E) complexity, which would be around O(3 * 10^7) in the worst case.)
Since the original code that compares results using std::multiset has O(V * logV), I'm worrying the maximum bipartite matching algorithms might be too slow and wondering about other options we have.
cc @mbasmanova @oerling
For reference, here is how this is implemented in Presto: https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/testing/MaterializedRow.java#L185
@kagamiori E usually is not as bad as V^2 because an edge only exists if the difference between the 2 sides is under epsilon. The complexity is at least O(V^2) though, worse than the current version. We could do some sort and clustering to improve it though if this becomes a problem.
For the rounding solution (as @mbasmanova pointed this is also used in Presto), there is a problem that if 2 values are close enough, but they fall into two different buckets, we will have a false positive. For example if we round 2.715 to 2 digits after decimal point, it becomes 2.72, but 2.714999999 would become 2.71, even they are very close to each other.
Perhaps, most use cases are aggregations with non-floating point keys and floating point aggregates. We could first bucket the rows by non-floating point columns, then compare with epsilon within the buckets. Assuming buckets will be very small (1 row), we could use some simple algorithms and throw if bucket size is large (> 5 rows).
@mbasmanova Yes that's one optimization, we should reorder the columns to put floating point columns at end and sort and pair them by non-floating point columns first
@kagamiori E usually is not as bad as V^2 because an edge only exists if the difference between the 2 sides is under epsilon. The complexity is at least O(V^2) though, worse than the current version. We could do some sort and clustering to improve it though if this becomes a problem.
For the rounding solution (as @mbasmanova pointed this is also used in Presto), there is a problem that if 2 values are close enough, but they fall into two different buckets, we will have a false positive. For example if we round 2.715 to 2 digits after decimal point, it becomes 2.72, but 2.714999999 would become 2.71, even they are very close to each other.
Thanks @mbasmanova and @Yuhta! Presto rounds floating point numbers to 5 significant digits by default. The problem that @Yuhta pointed out can indeed exist. I'll prototype a Fold-Fulkerson or Kuhn-Munkres algorithms to see if the performance of maximum bipartite matching is acceptable for us.
Perhaps, most use cases are aggregations with non-floating point keys and floating point aggregates. We could first bucket the rows by non-floating point columns, then compare with epsilon within the buckets. Assuming buckets will be very small (1 row), we could use some simple algorithms and throw if bucket size is large (> 5 rows).
@mbasmanova Yes that's one optimization, we should reorder the columns to put floating point columns at end and sort and pair them by non-floating point columns first
@mbasmanova @Yuhta, Sorry I somehow didn't see your last two messages when I replied. I think the optimization you mentioned would work for aggregation tests. But assertEqualResults() and assertResults() (that this bug is in) are also used in join fuzzer, TaskTest, HashJoinTest, MergeJoinTest, and OperatorTestBase. For these use cases, we may not be guaranteed to have small buckets.
But assertEqualResults() and assertResults() (that this bug is in) are also used in join fuzzer, TaskTest, HashJoinTest, MergeJoinTest, and OperatorTestBase. For these use cases, we may not be guaranteed to have small buckets.
That's a good point. In these cases through floating point values tend to be copied directly from input to output as opposed to be computed. Hence, the chance for divergence is much lower than in aggregations.
Description
This is a failure of aggregation fuzzer test, but the error message doesn't suggest there is unmatched result. We need to investigate into the error to tell whether it is a bug in the fuzzer itself or the core library.
Error Reproduction
In internal repository, checkout 7706e3790 and run
velox/exec/tests/velox_aggregation_fuzzer_test --seed 2831077879
.Relevant logs