apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.58k stars 779 forks source link

Support like operation on binary operands in arrow-string #4725

Open JayjeetAtGithub opened 1 year ago

JayjeetAtGithub commented 1 year ago

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Stems from https://github.com/apache/arrow-datafusion/issues/7342

Describe the solution you'd like

Add a like_binary_scalar function in arrow_string/src/like.rs

Describe alternatives you've considered

Additional context

tustvold commented 1 year ago

Are these kernels well defined for binary inputs? Do other databases support them? I thought like implicitly had a notion of a character, which in turn would require a known encoding?

tustvold commented 1 year ago

It would appear duckdb does not support this, but postgres does - https://dbfiddle.uk/6jMPTJjs

Further investigation would suggest postgres does not attempt to handle unicode when fed bytea strings, treating each byte as a separate "character" - https://dbfiddle.uk/mSxJt2ya. ILIKE is not supported.

I think more investigation into other systems is warranted before proceeding here, the semantics seem a touch confused

alamb commented 1 year ago

I think this is something we can handle at the datafusion level. See https://github.com/apache/arrow-datafusion/issues/7342#issuecomment-1690297040