brimdata / zed

A novel data lake based on super-structured data
https://zed.brimdata.io/
BSD 3-Clause "New" or "Revised" License
1.38k stars 67 forks source link

Incorrect comparison result for "equal" values of different float types #5295

Open philrz opened 1 week ago

philrz commented 1 week ago

tl;dr

As a user, this result seems incorrect to me.

$ echo '0.99 (float32)' | zq -Z 'yield this > 0.99' -
true

Details

Repro is with Zed commit 2357e17.

This came up while trying to reproduce the equivalent of the mgbench bench1 q4 in Zed.

Compare the three queries below.

$ zq -version
Version: v1.17.0-74-g2357e178

$ echo '0.99' | zq -Z 'yield this > 0.99' -
false

$ echo '0.99 (float32)' | zq -Z 'yield this > 0.99' -
true

$ echo '0.99 (float32)' | zq -Z 'yield this > float32(0.99)' -
false

I know floats are notoriously fussy when it comes to establishing strict equality between values, so I assume the explanation has something to do with that.

This came up because the mgbench bench1 DDL puts the values into 32-bit floats and then the query applies a filter cpu_wio > 0.99. To attempt the equivalent in Zed, I shaped the values to float32 at load time and applied a similar filter in the Zed query, but the problem described here meant several cpu_wio equal to 0.99 made it past the filter and were counted in the query, effectively producing an incorrect result for the benchmark. I can certainly work around this for now by adding the float32(0.99) cast, but I'm not sure if this is something we'd expect users to do.