Open JayjeetAtGithub opened 1 year ago
Are these kernels well defined for binary inputs? Do other databases support them? I thought like implicitly had a notion of a character, which in turn would require a known encoding?
It would appear duckdb does not support this, but postgres does - https://dbfiddle.uk/6jMPTJjs
Further investigation would suggest postgres does not attempt to handle unicode when fed bytea strings, treating each byte as a separate "character" - https://dbfiddle.uk/mSxJt2ya. ILIKE is not supported.
I think more investigation into other systems is warranted before proceeding here, the semantics seem a touch confused
I think this is something we can handle at the datafusion level. See https://github.com/apache/arrow-datafusion/issues/7342#issuecomment-1690297040
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Stems from https://github.com/apache/arrow-datafusion/issues/7342
Describe the solution you'd like
Add a
like_binary_scalar
function inarrow_string/src/like.rs
Describe alternatives you've considered
Additional context