apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.55k stars 3.54k forks source link

pyarrow.compute.Expression.to_substrait() is missing conversions #39691

Open davlee1972 opened 9 months ago

davlee1972 commented 9 months ago

Describe the usage question you have. Please include as many useful details as possible.

I'm not sure why this is missing..

pyarrow.lib.ArrowNotImplementedError: No conversion function exists to convert the Arrow function ends_with to a Substrait call

Functions like pyarrow.compute.ends_with() should be mapped to substrait's ends_with()..

https://substrait.io/extensions/functions_string/

ends_with[¶](https://substrait.io/extensions/functions_string/#ends_with)
Implementations:
ends_with(input, substring, option:case_sensitivity): -> return_type

input: The input string.
substring: The substring to search for.
0. ends_with(varchar<L1>, varchar<L2>, option:case_sensitivity): -> BOOLEAN
1. ends_with(varchar<L1>, string, option:case_sensitivity): -> BOOLEAN
2. ends_with(varchar<L1>, fixedchar<L2>, option:case_sensitivity): -> BOOLEAN
3. ends_with(string, string, option:case_sensitivity): -> BOOLEAN
4. ends_with(string, varchar<L1>, option:case_sensitivity): -> BOOLEAN
5. ends_with(string, fixedchar<L1>, option:case_sensitivity): -> BOOLEAN
6. ends_with(fixedchar<L1>, fixedchar<L2>, option:case_sensitivity): -> BOOLEAN
7. ends_with(fixedchar<L1>, string, option:case_sensitivity): -> BOOLEAN
8. ends_with(fixedchar<L1>, varchar<L2>, option:case_sensitivity): -> BOOLEAN

https://arrow.apache.org/docs/python/generated/pyarrow.compute.ends_with.html

>>> import pyarrow.compute as pc
>>> import pyarrow as pa

>>> exp = pc.ends_with(pc.field("first_name"), "rine")
>>> exp
<pyarrow.compute.Expression ends_with(first_name, {pattern="rine", ignore_case=false})>

>>> substrait_expression = exp.to_substrait(pa.schema([pa.field('first_name', 'string')]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pyarrow/_compute.pyx", line 2458, in pyarrow._compute.Expression.to_substrait
  File "pyarrow/_substrait.pyx", line 247, in pyarrow._substrait.serialize_expressions
  File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: No conversion function exists to convert the Arrow function ends_with to a Substrait call
>>>

Component(s)

C++, Python

mbwhite commented 8 months ago

I've also observed similar but would like to understand how it's possible to add these, can be they be configured via an API or does it require change to the underlying code.l