apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.46k stars 729 forks source link

String search kernel optimisations #6107

Open samuelcolvin opened 1 month ago

samuelcolvin commented 1 month ago

The main context for this is well described by https://github.com/BurntSushi/memchr/pull/156.

I think (in rough order of impact) we should:

(I'm not suggesting that we make quick_strings a dependency, it was just a scratch experiment, if we use any of that code we should copy it.

samuelcolvin commented 1 month ago

I'm keen to try and work on this.

alamb commented 1 month ago

Thanks @samuelcolvin

I think in general the basic requirement for performance optimizations in this crate is benchmarks that show performance improvements to justify the additional code complexity / maintenance burden.

I think there are already several cargo bench style benchmarks for string operations -- maybe a good first step would be to review them and add any additional cases you think are not covered that would benefit from the optimizations described above

alamb commented 1 month ago

I think @Dandandan and @jhorstmann are especailly execited by low level optimizations like this 😁

samuelcolvin commented 1 month ago

While working on this, I found #6145, we should merge that, then rebase and review the other PRs here.