facebookincubator / velox

A C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.
https://velox-lib.io/
Apache License 2.0
3.42k stars 1.12k forks source link

SparkAggregationFuzzerTest doesn't share the default AggregationFuzzerRunner::customVerificationFunctions_ list #5871

Open kagamiori opened 1 year ago

kagamiori commented 1 year ago

Description

This problem was discovered in https://github.com/facebookincubator/velox/pull/5711.

Some aggregation functions require special handling during verification in fuzzer. There is a default list of them in AggregationFuzzerRunner. Both the Presto and Spark aggregation fuzzer test uses AggregationFuzzerRunner. However, the Presto aggregation fuzzer test uses this default AggregationFuzzerRunner::customVerificationFunctions, while the Spark aggregation fuzzer test passes its own customVerificationFunctions list to AggregationFuzzerRunner::runFuzzer() and this replaces the default one. This means that if there is an aggregation function used in both Presto and Spark and require custom verification, we'll need to add it to both AggregationFuzzerRunner::customVerificationFunctions and customVerificationFunctions in SparkAggregationFuzzerTest, which is not the best setting.

https://github.com/facebookincubator/velox/blob/7543853b46c5dfe1451e2c70b31ef5dd1f661c84/velox/exec/tests/AggregationFuzzerRunner.h#L83-L115

https://github.com/facebookincubator/velox/blob/7543853b46c5dfe1451e2c70b31ef5dd1f661c84/velox/exec/tests/SparkAggregationFuzzerTest.cpp#L59-L63

If we step back and look at the design, there are three situations:

  1. If there are separate Presto and Spark aggregation functions in Velox that have the same function name, then having a shared customVerificationFunctions_ list in AggregationFuzzerRunner doesn't make sense. Instead, individual lists of custom-verification functions should be defined in the Presto and Spark aggregation fuzzer tests separately.
  2. If all Presto and Spark aggregation functions of the same names share the same implementation, the custom-verification-functions list defined in SparkAggregationFuzzerTest should be complementary to AggregationFuzzerRunner::customVerificationFunctions_ rather than a replacement of it.
  3. If some Presto and Spark aggregation functions of the same names share the same implementation and some are implemented separately, then we can take the same approach as case (1).

Error Reproduction

N/A

Relevant logs

N/A
mbasmanova commented 1 year ago

CC: @rui-mo

rui-mo commented 1 year ago

If some Presto and Spark aggregation functions of the same names share the same implementation and some are implemented separately, then we can take the same approach as case (1).

Looks the third case is closer to current status, then below action should be applied base on the suggestion from case (1), am I right?

Individual lists of custom-verification functions should be defined in the Presto and Spark aggregation fuzzer tests separately.