apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.56k stars 3.54k forks source link

[C++][Compute] Add support for generic conversions to Function::DispatchBest #27386

Open asfimport opened 3 years ago

asfimport commented 3 years ago

ARROW-8919 adds support for execution with implicit casts to any function which overrides DispatchBest, allowing functions to specify conversions which make sense in that function's context. For example "add" can promote its arguments if their types disagree. By contrast, some conversions are more generic and could be applicable to any function's arguments. For example if any datum is dictionary encoded, a kernel which accepts the decoded type should be usable with an implicit decoding cast:


import pyarrow as pa
import pyarrow.compute as pc

arr = pa.array('hello ' * 10)
enc = arr.dictionary_encode()

# result should not depend on encoding:
assert pc.ascii_is_alnum(arr) == pc.ascii_is_alnum(enc)

# currently raises:
# ArrowNotImplementedError: Function ascii_is_alnum has no kernel matching
#    input types (array[dictionary<values=string, indices=int32, ordered=0>])

Reporter: Ben Kietzman / @bkietz

Note: This issue was originally created as ARROW-11508. Please see the migration documentation for further details.

asfimport commented 3 years ago

Eduardo Ponce / @edponce: A possible solution is establish rules for converting to a "base/operational" form each generic datum and define them as method _to_base/operationalform(). When an operation's inputs are of different non-primitive types and have the conversion methods, then convert them to their conformant "base/operational" form, and then apply the operation. The following are aspects to consider:

asfimport commented 2 years ago

Todd Farmer / @toddfarmer: This issue was last updated over 90 days ago, which may be an indication it is no longer being actively worked. To better reflect the current state, the issue is being unassigned. Please feel free to re-take assignment of the issue if it is being actively worked, or if you plan to start that work soon.