apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.62k stars 802 forks source link

Fix string view LIKE checks with NULL values #6662

Closed findepi closed 2 weeks ago

findepi commented 3 weeks ago

Which issue does this PR close?

Rationale for this change

Fix correctness

What changes are included in this PR?

Fix StringView LIKE .. result when tested values include nulls.

Are there any user-facing changes?

Yes

findepi commented 3 weeks ago

cc @goldmedal

findepi commented 2 weeks ago

cc @Dandandan @crepererum

alamb commented 2 weeks ago

Here are my benchmark results. My conclusion is that there is some non trivial variability in the benchmarks but I don't think this PR does anything substantial

group                                              findepi_fix-string-view-like-checks-with-null-values-1121b8    master
-----                                              -----------------------------------------------------------    ------
ilike_utf8 scalar complex                          1.03      2.8±0.07ms        ? ?/sec                            1.00      2.7±0.06ms        ? ?/sec
ilike_utf8 scalar contains                         1.04      4.3±0.06ms        ? ?/sec                            1.00      4.1±0.05ms        ? ?/sec
ilike_utf8 scalar ends with                        1.00  1255.7±49.78µs        ? ?/sec                            1.00  1250.3±33.44µs        ? ?/sec
ilike_utf8 scalar equals                           1.00   729.4±23.72µs        ? ?/sec                            1.07   781.1±24.73µs        ? ?/sec
ilike_utf8 scalar starts with                      1.01  1151.3±46.67µs        ? ?/sec                            1.00  1144.6±35.48µs        ? ?/sec
ilike_utf8_scalar_dyn dictionary[10] string[4])    1.00     77.9±0.30µs        ? ?/sec                            1.00     77.6±0.07µs        ? ?/sec
like_utf8 scalar complex                           1.02  1952.8±24.11µs        ? ?/sec                            1.00  1909.3±20.85µs        ? ?/sec
like_utf8 scalar contains                          1.10  1728.7±21.70µs        ? ?/sec                            1.00  1565.7±13.97µs        ? ?/sec
like_utf8 scalar ends with                         1.00    421.4±9.37µs        ? ?/sec                            1.01    424.1±7.08µs        ? ?/sec
like_utf8 scalar equals                            1.00    109.7±0.29µs        ? ?/sec                            1.16    127.1±0.33µs        ? ?/sec
like_utf8 scalar starts with                       1.00   345.5±12.23µs        ? ?/sec                            1.00   346.2±11.37µs        ? ?/sec
like_utf8_scalar_dyn dictionary[10] string[4])     1.00     77.6±0.22µs        ? ?/sec                            1.00     77.5±0.10µs        ? ?/sec
like_utf8view scalar complex                       1.06    195.4±0.68ms        ? ?/sec                            1.00    183.9±0.71ms        ? ?/sec
like_utf8view scalar contains                      1.07    148.5±0.33ms        ? ?/sec                            1.00    138.8±0.20ms        ? ?/sec
like_utf8view scalar ends with 13 bytes            1.00     43.0±0.30ms        ? ?/sec                            1.11     47.7±0.17ms        ? ?/sec
like_utf8view scalar ends with 4 bytes             1.00     43.7±0.30ms        ? ?/sec                            1.11     48.7±0.12ms        ? ?/sec
like_utf8view scalar ends with 6 bytes             1.00     43.5±0.29ms        ? ?/sec                            1.11     48.5±0.16ms        ? ?/sec
like_utf8view scalar equals                        1.00     33.6±0.20ms        ? ?/sec                            1.07     35.8±0.07ms        ? ?/sec
like_utf8view scalar starts with 13 bytes          1.00     46.6±0.28ms        ? ?/sec                            1.01     47.2±0.15ms        ? ?/sec
like_utf8view scalar starts with 4 bytes           1.00     33.2±0.11ms        ? ?/sec                            1.03     34.3±0.18ms        ? ?/sec
like_utf8view scalar starts with 6 bytes           1.00     47.1±0.24ms        ? ?/sec                            1.02     47.8±0.21ms        ? ?/sec
nilike_utf8 scalar complex                         1.06      2.9±0.07ms        ? ?/sec                            1.00      2.7±0.06ms        ? ?/sec
nilike_utf8 scalar contains                        1.04      4.3±0.07ms        ? ?/sec                            1.00      4.1±0.05ms        ? ?/sec
nilike_utf8 scalar ends with                       1.00  1228.8±29.01µs        ? ?/sec                            1.00  1229.4±21.51µs        ? ?/sec
nilike_utf8 scalar equals                          1.00   735.4±24.25µs        ? ?/sec                            1.05   773.3±12.76µs        ? ?/sec
nilike_utf8 scalar starts with                     1.05  1174.5±40.82µs        ? ?/sec                            1.00  1121.1±17.17µs        ? ?/sec
nlike_utf8 scalar complex                          1.02  1950.7±31.49µs        ? ?/sec                            1.00  1907.6±24.54µs        ? ?/sec
nlike_utf8 scalar contains                         1.10  1728.3±15.41µs        ? ?/sec                            1.00  1568.6±15.88µs        ? ?/sec
nlike_utf8 scalar ends with                        1.00    420.2±8.39µs        ? ?/sec                            1.00    421.9±7.39µs        ? ?/sec
nlike_utf8 scalar equals                           1.00    109.6±0.28µs        ? ?/sec                            1.16    126.9±0.13µs        ? ?/sec
nlike_utf8 scalar starts with                      1.00    344.2±6.52µs        ? ?/sec                            1.03   355.1±17.11µs        ? ?/sec