apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator
https://datafusion.apache.org/comet
Apache License 2.0
447 stars 100 forks source link

Comet sort order different to Spark for 0.0 and -0.0 #353

Open andygrove opened 2 weeks ago

andygrove commented 2 weeks ago

Describe the bug

During testing with CAST, we execute queries with ORDER by a and it seems the results are ordered differently for 0.0 vs -0.0 with float and double.

![0.0,0.0]                                [-0.0,-0.0]
![-0.0,-0.0]                              [0.0,0.0]

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

andygrove commented 2 weeks ago

Scala/Java seems to take the sign into account during sorting and Rust does not. Perhaps we just need to document this as an edge case in the compatibility guide.

Scala

val x: Seq[Float] = Seq(0.0f, -0.0f, 0.0f, -0.0f, 1.0f, -1.0f)
println(x.sorted)

Output: List(-1.0, -0.0, -0.0, 0.0, 0.0, 1.0)

Rust

let mut v = vec![0.0_f32, -0.0_f32, 0.0_f32, -0.0_f32, 1.0_f32, -1.0_f32];
v.sort_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Greater));
println!("{:?}", v);

Output: [-1.0, 0.0, -0.0, 0.0, -0.0, 1.0]

andygrove commented 2 weeks ago

We should also test with NaN in sorting

parthchandra commented 2 weeks ago

TIL: IEEE 754 says that +0.0 and -0.0 are equal and in C and Java +0.0 == -0.0 is true but in Java the Double.equals does not treat the two as equal. https://en.wikipedia.org/wiki/Signed_zero#Comparisons

I don't think SQL distinguishes between the two so I would not consider this an issue that needs to be fixed.

parthchandra commented 2 weeks ago

We should also test with NaN in sorting

Also +infinity and -infinity while we are at it.

andygrove commented 2 weeks ago

It is probably worth having a section in the compatibility guide specifically for Rust vs Java differences like this.