Open kagamiori opened 7 months ago
When there are multiple NaN in the input of count(distinct x), Velox treat all NaN to be distinct, while Presto treat NaNs to be duplicates.
Velox:
TEST_F(CountAggregationTest, distinct) { auto nan = std::numeric_limits<double>::quiet_NaN(); auto data = makeRowVector({ makeFlatVector<double>({1.1, nan, nan, nan, nan, nan, nan, nan}), }); createDuckDbTable({data}); // Global aggregation. auto testGlobal = [&](const std::string& input) { auto plan = PlanBuilder() .values({data}) .singleAggregation({}, {fmt::format("count(distinct {})", input)}) .planNode(); AssertQueryBuilder(plan, duckDbQueryRunner_) .assertResults( fmt::format("SELECT count(distinct {}) FROM tmp", input)); }; testGlobal("c0"); -- Velox result is 8 }
Presto:
SELECT COUNT(DISTINCT c0) FROM ( VALUES (1.1), (NAN()), (NAN()) ) t(c0); -- Presto result is 2
N/A
No response
cc @mbasmanova
Bug description
When there are multiple NaN in the input of count(distinct x), Velox treat all NaN to be distinct, while Presto treat NaNs to be duplicates.
Velox:
Presto:
System information
N/A
Relevant logs
No response