apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.25k stars 1.18k forks source link

Add fuzz support for `Timestamp`, `Binary` and `Float` #13279

Open alamb opened 5 hours ago

alamb commented 5 hours ago

Is your feature request related to a problem or challenge?

Part of https://github.com/apache/datafusion/issues/7065

We have a great aggregation fuzz tester that is run like this:

cargo test --test fuzz -- aggregate

This fuzz tester is important to cover all the various combinations of types and columns and aggregates, given DataFusion has many different specialized code paths based on types.

The fuzz tester currently covers many different types, but not all of them. This ticket covers adding support for a few more types

Describe the solution you'd like

Add support for the following data types:

  1. Float32/Float64
  2. Timestamp (Timestamp(Seconds, ..), Timestamp(Milliseconds, ..), ...
  3. Binary / LargeBinary / BinaryView

Describe alternatives you've considered

I think following the model in this PR from @LeslieKid where they added time/interval/ decimal/utf8view in https://github.com/apache/datafusion/pull/13226

Additional context

No response

jonathanc-n commented 5 hours ago

take

jonathanc-n commented 4 hours ago

@alamb Is there a certain range for which float point numbers are compared equal to each other? any best practices?

alamb commented 3 hours ago

@alamb Is there a certain range for which float point numbers are compared equal to each other? any best practices?

For determining groups, the comparison is exact equality even for floats (this is different for aggregates like SUM / AVG where floating point roundoff needs to be accounted for)