Open kagamiori opened 2 weeks ago
I noticed this on some of my PRs as well, I was wondering if it my be an issue exposed by https://github.com/facebookincubator/velox/pull/10006 with NaNs in the off1 column
Update:
I reproduced the error with this command: velox_window_fuzzer_test --enable_window_reference_verification --presto_url=http://127.0.0.1:8080 --logtostderr=1 --minloglevel=0 --duration_sec=360 --num_batches=1 --batch_size=30 --only=stddev_samp --seed=2356821713 --duration_sec=0 --steps=1 -v=1
.
This iteration runs Window[1][partition by [p0] order by [s0 DESC NULLS FIRST] w0 := stddev_samp(ROW["c0"]) RANGE between off0 FOLLOWING and off1 FOLLOWING]
. There is one partition containing three input rows with the exact same values of sorting key, frame start, and frame end, i.e., they are peer rows and should all return the same window result. I found that WindowPartition::computeKRangeFrameBounds() doesn't handle NaN correctly and produces incorrect frame bounds. This further makes AggregateWindowFunction::analyzeFrameValues() to incorrectly consider the frames in this partition to be incremental and calls incrementalAggregation().
I'm continuing investigating.
Example queries to reproduce the mismatch:
SELECT
s0,
SUM(c0) OVER (
ORDER BY
s0 asc
RANGE BETWEEN 0.0 PRECEDING AND nan() PRECEDING
)
FROM (
VALUES
(1, 1.0),
(2, 1.0),
(3, 1.0)
) t(c0, s0); -- different results
SELECT
s0,
SUM(c0) OVER (
ORDER BY
s0 asc
RANGE BETWEEN nan() PRECEDING AND 0.0 PRECEDING
)
FROM (
VALUES
(1, 1.0),
(2, 1.0),
(3, 1.0)
) t(c0, s0); -- Presto throws while Prestissimo doesn't
SELECT
s0,
SUM(c0) OVER (
ORDER BY
s0 desc
RANGE BETWEEN 0.0 FOLLOWING AND nan() FOLLOWING
)
FROM (
VALUES
(1, 1.0),
(2, 1.0),
(3, 1.0)
) t(c0, s0); -- Presto throws while Prestissimo doesn't
SELECT
s0,
SUM(c0) OVER (
ORDER BY
s0 desc
RANGE BETWEEN nan() FOLLOWING AND 0.0 FOLLOWING
)
FROM (
VALUES
(1, 1.0),
(2, 1.0),
(3, 1.0)
) t(c0, s0); -- different results
Both Presto and Velox results of the queries above are incorrect. https://github.com/prestodb/presto/issues/23822
Hi @aditi-pandit and @pramodsatya, the recent window fuzzer enhancement of k-range frames found this bug in both Presto and Velox. We're considering making window operations return NULL when any of the frame bound is NaN (https://github.com/prestodb/presto/issues/23822#issuecomment-2411843864). Please let us know if you have any thoughts. Thanks!
Hi @aditi-pandit and @pramodsatya, the recent window fuzzer enhancement of k-range frames found this bug in both Presto and Velox. We're considering making window operations return NULL when any of the frame bound is NaN (prestodb/presto#23822 (comment)). Please let us know if you have any thoughts. Thanks!
Thanks @kagamiori, returning null for Nan frame bounds would be better.
Description
Example failure: https://github.com/facebookincubator/velox/actions/runs/11247502907/job/31271630633
Error Reproduction
No response
Relevant logs
No response