Closed drabastomek closed 3 years ago
STDDEV and STDDEV_POP are not explicitly implemented in BSQL, instead, calcite converts them into a composition of different operations, which we should support. The first step to troubleshoot this is to make sure that the logical plan provided by looks correct
Logical plans are as follows:
bc.explain('SELECT category, STDDEV(np_float64) FROM dtype_test GROUP BY category')
produces
LogicalProject(category=[$0], EXPR$1=[POWER(/(-($1, /(*($2, $2), $3)), CASE(=($3, 1), null:BIGINT, -($3, 1))), 0.5:DECIMAL(2, 1))])
LogicalProject(category=[$0], $f1=[CASE(=($2, 0), null:DOUBLE, $1)], $f2=[CASE(=($4, 0), null:DOUBLE, $3)], $f3=[$4])
LogicalAggregate(group=[{0}], agg#0=[$SUM0($2)], agg#1=[COUNT($2)], agg#2=[$SUM0($1)], agg#3=[COUNT($1)])
LogicalProject(category=[$0], np_float64=[$1], $f2=[*($1, $1)])
BindableTableScan(table=[[main, dtype_test]], projects=[[1, 0]], aliases=[[category, np_float64]])
bc.explain('SELECT category, STDDEV_POP{(np_float64) FROM dtype_test GROUP BY category')
produces
LogicalProject(category=[$0], EXPR$1=[POWER(/(-($1, /(*($2, $2), $3)), $3), 0.5:DECIMAL(2, 1))])
LogicalProject(category=[$0], $f1=[CASE(=($2, 0), null:DOUBLE, $1)], $f2=[CASE(=($4, 0), null:DOUBLE, $3)], $f3=[$4])
LogicalAggregate(group=[{0}], agg#0=[$SUM0($2)], agg#1=[COUNT($2)], agg#2=[$SUM0($1)], agg#3=[COUNT($1)])
LogicalProject(category=[$0], np_float64=[$1], $f2=[*($1, $1)])
BindableTableScan(table=[[main, dtype_test]], projects=[[1, 0]], aliases=[[category, np_float64]])
3.
bc.explain('SELECT category, STDDEV_SAMP{(np_float64) FROM dtype_test GROUP BY category')
produces
LogicalProject(category=[$0], EXPR$1=[POWER(/(-($1, /(*($2, $2), $3)), CASE(=($3, 1), null:BIGINT, -($3, 1))), 0.5:DECIMAL(2, 1))])
LogicalProject(category=[$0], $f1=[CASE(=($2, 0), null:DOUBLE, $1)], $f2=[CASE(=($4, 0), null:DOUBLE, $3)], $f3=[$4])
LogicalAggregate(group=[{0}], agg#0=[$SUM0($2)], agg#1=[COUNT($2)], agg#2=[$SUM0($1)], agg#3=[COUNT($1)])
LogicalProject(category=[$0], np_float64=[$1], $f2=[*($1, $1)])
BindableTableScan(table=[[main, dtype_test]], projects=[[1, 0]], aliases=[[category, np_float64]])
The logical plans are identical for STDDEV
and STDDEV_SAMP
thus this is a Calcite issue rather than ours.
Describe the bug The
STDDEV
SQL function produces wrong result -- given how the standard deviation, population and sample standard deviation are defined, it should produce the same results asSTDDEV_POP
but produces the same results asSTDDEV_SAMP
.Steps/Code to reproduce bug Steps to reproduce
produces
Expected behavior Table should produce the following
The same results were replicated in MariaDB/MySQL database.
Environment overview (please complete the following information)
'ba4a97a3ce3f1c733a135bf88518f2f67b12b519'