apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.3k stars 1.19k forks source link

Epic: Simplify functions signature with LogicalType #13301

Open jayzhan211 opened 1 week ago

jayzhan211 commented 1 week ago

Is your feature request related to a problem or challenge?

Some functions signature is quite verbose, like

impl RegexpLikeFunc {
    pub fn new() -> Self {
        Self {
            signature: Signature::one_of(
                vec![
                    TypeSignature::Exact(vec![Utf8View, Utf8]),
                    TypeSignature::Exact(vec![Utf8View, Utf8View]),
                    TypeSignature::Exact(vec![Utf8View, LargeUtf8]),
                    TypeSignature::Exact(vec![Utf8, Utf8]),
                    TypeSignature::Exact(vec![Utf8, Utf8View]),
                    TypeSignature::Exact(vec![Utf8, LargeUtf8]),
                    TypeSignature::Exact(vec![LargeUtf8, Utf8]),
                    TypeSignature::Exact(vec![LargeUtf8, Utf8View]),
                    TypeSignature::Exact(vec![LargeUtf8, LargeUtf8]),
                    TypeSignature::Exact(vec![Utf8View, Utf8, Utf8]),
                    TypeSignature::Exact(vec![Utf8View, Utf8View, Utf8]),
                    TypeSignature::Exact(vec![Utf8View, LargeUtf8, Utf8]),
                    TypeSignature::Exact(vec![Utf8, Utf8, Utf8]),
                    TypeSignature::Exact(vec![Utf8, Utf8View, Utf8]),
                    TypeSignature::Exact(vec![Utf8, LargeUtf8, Utf8]),
                    TypeSignature::Exact(vec![LargeUtf8, Utf8, Utf8]),
                    TypeSignature::Exact(vec![LargeUtf8, Utf8View, Utf8]),
                    TypeSignature::Exact(vec![LargeUtf8, LargeUtf8, Utf8]),
                ],
                Volatility::Immutable,
            ),
        }
    }
}

Can replace it with `Signature::string(2, Volatility::Immutable)`
impl LPadFunc {
    pub fn new() -> Self {
        use DataType::*;
        Self {
            signature: Signature::one_of(
                vec![
                    Exact(vec![Utf8View, Int64]),
                    Exact(vec![Utf8View, Int64, Utf8View]),
                    Exact(vec![Utf8View, Int64, Utf8]),
                    Exact(vec![Utf8View, Int64, LargeUtf8]),
                    Exact(vec![Utf8, Int64]),
                    Exact(vec![Utf8, Int64, Utf8View]),
                    Exact(vec![Utf8, Int64, Utf8]),
                    Exact(vec![Utf8, Int64, LargeUtf8]),
                    Exact(vec![LargeUtf8, Int64]),
                    Exact(vec![LargeUtf8, Int64, Utf8View]),
                    Exact(vec![LargeUtf8, Int64, Utf8]),
                    Exact(vec![LargeUtf8, Int64, LargeUtf8]),
                ],
                Volatility::Immutable,
            ),
        }
    }
}

Can replace it with one of `Signature::coercible(string, int)` and `Signature::coercible(string, int, string)`

Describe the solution you'd like

13240 starts an attempt to bring logical type to function signature

There are more functions to be cleanup. The example above can be replaced with TypeSiganture::String, TypeSiganture::Numeric, TypeSiganture::Coercible.

We might also need time related signature for time function.

Improve test coverage with these functions especially with different kinds of types would be great 👍

Describe alternatives you've considered

No response

The role of TypeSignature

TypeSignature used in function which is responsible for handling

  1. length check
  2. type checking and casting

The functions behaviour follows Postgres, DuckDB or other well-designed database. Can check whether the result and coercion is consistent with them.

If the result is consistent in both Postgres, DuckDB, we should follow them. Otherwise, we follow either of them.

For the casting rule, I think we can follow DuckDB's casting rule described here

TypeSignature should handle implicit casting

Implicit Casting In many situations, the system will add casts by itself. This is called implicit casting. This happens for example when a function is called with an argument that does not match the type of the function, but can be casted to the desired type.

Consider the function sin(DOUBLE). This function takes as input argument a column of type DOUBLE, however, it can be called with an integer as well: sin(1). The integer is converted into a double before being passed to the sin function.

Implicit casts can only be added for a number of type combinations, and is generally only possible when the cast cannot fail. For example, an implicit cast can be added from INTEGER to DOUBLE – but not from DOUBLE to INTEGER.

Screenshot 2024-11-08 at 1 44 56 PM

Good first issue list (non-completed)

solomonope commented 1 week ago

take

jayzhan211 commented 4 days ago

@jonathanc-n You can take a look on this if you are interested, there are tons of functions require the change, not able to resolved in single PR. Signature like Exact and Uniform are the targeted one to replace with

jonathanc-n commented 3 days ago

Thanks, i'll look into it!