facebookincubator / velox

A composable and fully extensible C++ execution engine library for data management systems.
https://velox-lib.io/
Apache License 2.0
3.52k stars 1.15k forks source link

VectorFuzzer doesnt produce NaN's/ +Inf, -Inf for floating point types, negative integers for Integers . #8638

Open kgpai opened 9 months ago

kgpai commented 9 months ago

Description

What Vector Fuzzer's rand function for double, float (https://github.com/facebookincubator/velox/blob/main/velox/vector/fuzzer/VectorFuzzer.cpp#L92 ) in its current implementation doesnt support Nan, positive infinity, negative inifinity . It also doesnt seem to generate negative values for Integers (possibly even MAX_INT, MIN_INT etc). There are often bugs associated with these values and we should add support to increase frequency of these values in the fuzzer.

Why Recently a bug was found in difference of behaviour of max in Velox and Presto java :

c++ ======================================
presto:feed> select max(x) from (values 4.0,nan(),null) T(x);
   4.0

presto:feed> select max(x) from (values nan(),nan(),null) T(x);
 -1.7976931348623157E308

presto:feed> select min(x) from (values 4.0,nan(),null) T(x);
   NaN

presto:feed> select min(x) from (values nan(),nan(),null) T(x);
   NaN

java =====================================
presto:feed> select max(x) from (values 4.0,nan(),null) T(x);
   4.0

presto:feed> select max(x) from (values nan(),nan(),null) T(x);
   NaN

presto:feed> select min(x) from (values 4.0,nan(),null) T(x);
   4.0

presto:feed> select min(x) from (values nan(),nan(),null) T(x);
   NaN

How Once support is added to the VectorFuzzer , you also need to run all the fuzzers to ensure that these run stable and no new bugs are discovered.

Error Reproduction

No response

Relevant logs

No response

mbasmanova commented 9 months ago

CC: @pedroerp @aditi-pandit

pedroerp commented 9 months ago

For integers, it doesn't generate negative numbers because boost's uniform_int_distribution's range by default starts in zero. One can override it with min()/max() using a different constructor. But as we with other fuzzer changes, the challenge is catching any potential bugs this might uncover.

https://github.com/facebookincubator/velox/blob/main/velox/vector/fuzzer/VectorFuzzer.cpp#L88

kgpai commented 9 months ago

I am going to start off with adding some weights to generate 'NaN', 'Inf' , -Inf at around 10% chance each , so 30% of time one of NaN, Inf, -Inf are generated and 70% values from the 0-1 range , run fuzzers and see what errors we get.