Aggregation fuzzer test fails with unmatched result

kagamiori commented 1 year ago

Description

This is a failure of aggregation fuzzer test, but the error message doesn't suggest there is unmatched result. We need to investigate into the error to tell whether it is a bug in the fuzzer itself or the core library.

Error Reproduction

In internal repository, checkout 7706e3790 and run velox/exec/tests/velox_aggregation_fuzzer_test --seed 2831077879.

Relevant logs

I1212 11:51:53.716396 3375096 AggregationFuzzer.cpp:503] ==============================> Started iteration 0 (seed: 2831077879)
I1212 11:51:53.726944 3375096 AggregationFuzzer.cpp:595] Executing query plan: 
-- Aggregation[SINGLE [g0, g1, g2] a0 := stddev_pop(ROW["c0"])] -> g0:REAL, g1:BOOLEAN, g2:TIMESTAMP, a0:DOUBLE
  -- Values[1000 rows in 10 vectors] -> c0:BIGINT, g0:REAL, g1:BOOLEAN, g2:TIMESTAMP
I1212 11:51:53.752323 3375096 AggregationFuzzer.cpp:610] [ROW ROW<g0:REAL,g1:BOOLEAN,g2:TIMESTAMP,a0:DOUBLE>: 853 elements, no nulls]
I1212 11:51:53.827328 3375096 AggregationFuzzer.cpp:192] Testing plan #0
I1212 11:51:53.827430 3375096 AggregationFuzzer.cpp:595] Executing query plan: 
-- Aggregation[FINAL [g0, g1, g2] a0 := stddev_pop("a0")] -> g0:REAL, g1:BOOLEAN, g2:TIMESTAMP, a0:DOUBLE
  -- Aggregation[PARTIAL [g0, g1, g2] a0 := stddev_pop(ROW["c0"])] -> g0:REAL, g1:BOOLEAN, g2:TIMESTAMP, a0:ROW<"":BIGINT,"":DOUBLE,"":DOUBLE>
    -- Values[1000 rows in 10 vectors] -> c0:BIGINT, g0:REAL, g1:BOOLEAN, g2:TIMESTAMP
I1212 11:51:53.850160 3375096 AggregationFuzzer.cpp:610] [ROW ROW<g0:REAL,g1:BOOLEAN,g2:TIMESTAMP,a0:DOUBLE>: 853 elements, no nulls]
velox/exec/tests/utils/QueryAssertions.cpp:986: Failure
Value of: false
  Actual: false
Expected: true
Expected 853, got 853
0 extra rows, 0 missing rows
0 of extra rows:

0 of missing rows:

Unexpected results
E1212 11:51:53.914549 3375096 Exceptions.h:68] Line: velox/exec/tests/AggregationFuzzer.cpp:838, Function:testPlan, Expression: assertEqualResults({expected.result}, {actual.result}) Logically equivalent plans produced different results, Source: RUNTIME, ErrorCode: INVALID_STATE
terminate called after throwing an instance of 'facebook::velox::VeloxRuntimeError'
  what():  Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: Logically equivalent plans produced different results
Retriable: False
Expression: assertEqualResults({expected.result}, {actual.result})
Function: testPlan
File: velox/exec/tests/AggregationFuzzer.cpp
Line: 838
Stack trace:
# 0  0x0000000000000000
# 1  0x0000000000000000
# 2  0x0000000000000000
# 3  0x0000000000000000
# 4  facebook::velox::exec::test::(anonymous namespace)::AggregationFuzzer::testPlan(std::shared_ptr<facebook::velox::core::PlanNode const> const&, bool, bool, facebook::velox::exec::test::(anonymous namespace)::ResultOrError const&)
# 5  facebook::velox::exec::test::(anonymous namespace)::AggregationFuzzer::testPlans(std::vector<std::shared_ptr<facebook::velox::core::PlanNode const>, std::allocator<std::shared_ptr<facebook::velox::core::PlanNode const> > > const&, bool, facebook::velox::exec::test::(anonymous namespace)::ResultOrError const&)
# 6  facebook::velox::exec::test::(anonymous namespace)::AggregationFuzzer::verifyAggregation(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::shared_ptr<facebook::velox::RowVector>, std::allocator<std::shared_ptr<facebook::velox::RowVector> > > const&, bool, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)
# 7  facebook::velox::exec::test::(anonymous namespace)::AggregationFuzzer::go()
# 8  facebook::velox::exec::test::aggregateFuzzer(std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<std::shared_ptr<facebook::velox::exec::AggregateFunctionSignature>, std::allocator<std::shared_ptr<facebook::velox::exec::AggregateFunctionSignature> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::vector<std::shared_ptr<facebook::velox::exec::AggregateFunctionSignature>, std::allocator<std::shared_ptr<facebook::velox::exec::AggregateFunctionSignature> > > > > >, unsigned long, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&)
# 9  AggregationFuzzerRunner::run(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, std::unordered_set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&)
# 10 main
# 11 __libc_start_call_main
# 12 __libc_start_main_alias_2
# 13 _start

*** Aborted at 1670874713 (Unix time, try 'date -d @1670874713') ***
*** Signal 6 (SIGABRT) (0x37b3c00337ff8) received by PID 3375096 (pthread TID 0x7f2489e50080) (linux TID 3375096) (maybe from PID 3375096, UID 228156) (code: -6), stack trace: ***
    @ 000000000001100e folly::symbolizer::(anonymous namespace)::innerSignalHandler(int, siginfo_t*, void*)
                       ./folly/experimental/symbolizer/SignalHandler.cpp:449
    @ 000000000000f731 folly::symbolizer::(anonymous namespace)::signalHandler(int, siginfo_t*, void*)
                       ./folly/experimental/symbolizer/SignalHandler.cpp:470
    @ 0000000000000000 (unknown)
    @ 000000000009c9d3 __GI___pthread_kill
    @ 00000000000444ec __GI_raise
    @ 000000000002c432 __GI_abort
    @ 00000000000a3fd4 __gnu_cxx::__verbose_terminate_handler()
    @ 00000000000a1b39 __cxxabiv1::__terminate(void (*)())
    @ 00000000000a1ba4 std::terminate()
    @ 00000000000a1ec1 __cxa_rethrow
    @ 0000000000076ab4 facebook::velox::exec::test::(anonymous namespace)::AggregationFuzzer::verifyAggregation(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::shared_ptr<facebook::velox::RowVector>, std::allocator<std::shared_ptr<facebook::velox::RowVector> > > const&, bool, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)
                       ./velox/exec/tests/AggregationFuzzer.cpp:1004
    @ 000000000004a504 facebook::velox::exec::test::(anonymous namespace)::AggregationFuzzer::go()
                       ./velox/exec/tests/AggregationFuzzer.cpp:571
    @ 0000000000045c22 facebook::velox::exec::test::aggregateFuzzer(std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<std::shared_ptr<facebook::velox::exec::AggregateFunctionSignature>, std::allocator<std::shared_ptr<facebook::velox::exec::AggregateFunctionSignature> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::vector<std::shared_ptr<facebook::velox::exec::AggregateFunctionSignature>, std::allocator<std::shared_ptr<facebook::velox::exec::AggregateFunctionSignature> > > > > >, unsigned long, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&)
                       ./velox/exec/tests/AggregationFuzzer.cpp:231
    @ 00000000002c8336 AggregationFuzzerRunner::run(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, std::unordered_set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&)
                       ./velox/exec/tests/AggregationFuzzerRunner.h:93
                       -> ./velox/exec/tests/AggregationFuzzerTest.cpp
    @ 00000000002c66cb main
                       ./velox/exec/tests/AggregationFuzzerTest.cpp:70
    @ 000000000002c656 __libc_start_call_main
    @ 000000000002c717 __libc_start_main_alias_2
    @ 00000000002bc1c0 _start
                       /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/x86_64/start.S:116
Aborted (core dumped)

kagamiori commented 1 year ago

This problem is caused by undefined behavior of std::multiset<MaterializedRow, MaterializedRowComparator> used when comparing the expected and actual results in QueryAssertions.cpp. The undefined behavior is due to MaterializedRowComparator not satisfying "strict weak ordering" required by std::multiset.

@Yuhta proposed one possible solution of using sort-merge to replace the use of std::multiset. First, we use the basic comparison without epsilon to ensure well-defined behavior of sorting. Then during the merge phase, we use epsilon in the comparisons to tolerate imprecision of floating point numbers.

mbasmanova commented 1 year ago

More details about the issue:

... the lessThanWithEpsilon logic in MaterializedRowComparator does not establish a "strict weak ordering" as required by the standard. Consider the following example with 3 vectors (a,b,c) and epsilon (e) equal to 3:

a = {2, 4}
b = {0, 8}
c = {4, 0}
e = 3

a < b < c < a
2 = 0 < 4 = 2
4 < 8   0 < 4

This is not transitive.

kagamiori commented 1 year ago

Two other failures related to this:

Yuhta commented 1 year ago

@oerling pointed out that sorting could result in different order when error is added. An example is sorting these 2 tuples:

(0, 0)
(0.0001, 100)

when there is an error of 0.001 on the first column of first row, the result becomes

(0.0001, 100)
(0.001, 0)

And comparison on second column would fail. In this case we probably need to do maximum bipartite matching.

kagamiori commented 1 year ago

And comparison on second column would fail. In this case we probably need to do maximum bipartite matching.

Hi @Yuhta, Thank you for the explanation! I looked at the algorithms for the maximum bipartite matching problem. The mostly used Fold-Fulkerson and Kuhn-Munkres algorithms have O(VE) time complexity where V is the number of rows in one result and E is V^2 in our problem. Since we usually use 1000 as the batch size in our tests, the complexity would be O(10^9) in the worst case. (The more complex Hopcroft–Karp algorithm has O(V^1/2 E) complexity, which would be around O(3 * 10^7) in the worst case.)

Since the original code that compares results using std::multiset has O(V * logV), I'm worrying the maximum bipartite matching algorithms might be too slow and wondering about other options we have.

Is it possible for our tests to consider the results to be fixed-point numbers, e.g., we consider both 10.001123 and 10.001345 to be 10.001 when comparing the actual and expected results? (If so, how many digits after the decimal point would be reasonable?)
Are you aware of other possible solutions? I searched for hashing of floating point numbers with epsilon, but it's generally not recommended.

cc @mbasmanova @oerling

mbasmanova commented 1 year ago

For reference, here is how this is implemented in Presto: https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/testing/MaterializedRow.java#L185

Yuhta commented 1 year ago

@kagamiori E usually is not as bad as V^2 because an edge only exists if the difference between the 2 sides is under epsilon. The complexity is at least O(V^2) though, worse than the current version. We could do some sort and clustering to improve it though if this becomes a problem.

For the rounding solution (as @mbasmanova pointed this is also used in Presto), there is a problem that if 2 values are close enough, but they fall into two different buckets, we will have a false positive. For example if we round 2.715 to 2 digits after decimal point, it becomes 2.72, but 2.714999999 would become 2.71, even they are very close to each other.

mbasmanova commented 1 year ago

Perhaps, most use cases are aggregations with non-floating point keys and floating point aggregates. We could first bucket the rows by non-floating point columns, then compare with epsilon within the buckets. Assuming buckets will be very small (1 row), we could use some simple algorithms and throw if bucket size is large (> 5 rows).

Yuhta commented 1 year ago

@mbasmanova Yes that's one optimization, we should reorder the columns to put floating point columns at end and sort and pair them by non-floating point columns first

kagamiori commented 1 year ago

@kagamiori E usually is not as bad as V^2 because an edge only exists if the difference between the 2 sides is under epsilon. The complexity is at least O(V^2) though, worse than the current version. We could do some sort and clustering to improve it though if this becomes a problem.

For the rounding solution (as @mbasmanova pointed this is also used in Presto), there is a problem that if 2 values are close enough, but they fall into two different buckets, we will have a false positive. For example if we round 2.715 to 2 digits after decimal point, it becomes 2.72, but 2.714999999 would become 2.71, even they are very close to each other.

Thanks @mbasmanova and @Yuhta! Presto rounds floating point numbers to 5 significant digits by default. The problem that @Yuhta pointed out can indeed exist. I'll prototype a Fold-Fulkerson or Kuhn-Munkres algorithms to see if the performance of maximum bipartite matching is acceptable for us.

kagamiori commented 1 year ago

Perhaps, most use cases are aggregations with non-floating point keys and floating point aggregates. We could first bucket the rows by non-floating point columns, then compare with epsilon within the buckets. Assuming buckets will be very small (1 row), we could use some simple algorithms and throw if bucket size is large (> 5 rows).

@mbasmanova Yes that's one optimization, we should reorder the columns to put floating point columns at end and sort and pair them by non-floating point columns first

@mbasmanova @Yuhta, Sorry I somehow didn't see your last two messages when I replied. I think the optimization you mentioned would work for aggregation tests. But assertEqualResults() and assertResults() (that this bug is in) are also used in join fuzzer, TaskTest, HashJoinTest, MergeJoinTest, and OperatorTestBase. For these use cases, we may not be guaranteed to have small buckets.

mbasmanova commented 1 year ago

But assertEqualResults() and assertResults() (that this bug is in) are also used in join fuzzer, TaskTest, HashJoinTest, MergeJoinTest, and OperatorTestBase. For these use cases, we may not be guaranteed to have small buckets.

That's a good point. In these cases through floating point values tend to be copied directly from input to output as opposed to be computed. Hence, the chance for divergence is much lower than in aggregations.

facebookincubator / velox