Open alamb opened 2 days ago
We can introduce TypeSignature::String similar to Numeric one
Would that allow to avoid use of Signature::user_defined
? if functions use user_defined
, then same functionality will have to be implemented independently, and inconsistencies are inevitable.
Can we make it a goal that not to require Signature::user_defined
to implement useful things?
BTW this related to two topics
Can we make it a goal that not to require Signature::user_defined to implement useful things?
Yes, I think this should be the goal
The usecase for Signature::user_defined
that I recall was to implement coercion rules for very specific functions (like coalesce)
We can introduce TypeSignature::String similar to Numeric one
I still don't fully understand why we can't just adjust the existing coercion rules to coerce Null to any needed type
Can we make it a goal that not to require Signature::user_defined to implement useful things?
Yes, I think this should be the goal
filed https://github.com/apache/datafusion/issues/12725 for this.
I still don't fully understand why we can't just adjust the existing coercion rules to coerce Null to any needed type
i would assume this is the case today.
https://github.com/apache/datafusion/pull/12712 changes strpos
to accept various string types (but not Null
), yet it works for null input values too.
Would that allow to avoid use of
Signature::user_defined
? if functions useuser_defined
, then same functionality will have to be implemented independently, and inconsistencies are inevitable. Can we make it a goal that not to requireSignature::user_defined
to implement useful things?BTW this related to two topics
concept of DF types / logical types / SQL types - [Proposal] Decouple logical from physical types #11513 / Extension Types #12644
- the function signatures should be expressible in terms abstracting from physical representation
simple functions Simple Functions #12635
- the function should be declared and implemented in terms abstracting from physical representation
Signature::user_defined
is only used for special function ideally, it is used in many places because we don't have a nice signature system yet.
I have an old issue about the coercion rule and type signature https://github.com/apache/datafusion/issues/10507
I think the downside of coerce_from
is that since all the coercion rule is in the one place, any change to it has unknown effect therefore hard to maintain, also it is not extensible. This is why we come out many other TypeSignature that avoid to go through the coerce_from
.
Ideally we should have general signature like TypeSignature::Numeric
or TypeSignature::String
for most of the function. Only if the one that is too specific (only used in one function), we use TypeSignature::user_defined
.
Is your feature request related to a problem or challenge?
The code in https://github.com/apache/datafusion/pull/12308 from @mesejo added explicit
DataType::Null
data type handling for type coercion for certain functionsHowever, this may have introduced a regression for handling Dictionary types https://github.com/apache/datafusion/issues/12670
Describe the solution you'd like
I would like the "normal" function signature resolution to handle Null coercion rather than requiring functions to provide a custom coerce method (as was done in https://github.com/apache/datafusion/pull/12308)
Comments from https://github.com/apache/datafusion/issues/12670#issuecomment-2385333981
Describe alternatives you've considered
Ideally I think the coercion logic should be able to substitute
Null
for any data type passed inSo given a signature like
I would expect the coercion logic to be able to handle inputs like the following (by casting
Null
to Utf8View)Additional context
No response