apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.83k stars 1.1k forks source link

[DISCUSSION] Potentially consolidate Expr::IsNotUnknown, `IsKnown`, etc #11282

Open alamb opened 1 month ago

alamb commented 1 month ago

Is your feature request related to a problem or challenge?

@findepi noted in slack

What is IsNotUnknown expression (and why IsNotNull is not enough)?

@Dandandan noted that

I think it is the same, yes. AFAICT we wouldn't need it and we could transform it to IsNotNull

Describe the solution you'd like

It would be nice to look into Expr Variants like IsNotUnknown,IsUnknown, IsTrue, IsFalse, (e.g. https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.Expr.html#variant.IsUnknown )

And see if there is a way to consolidate them into a few number of variants

Describe alternatives you've considered

No response

Additional context

I believe most of these variants came in via #3275 from @

jonahgao commented 1 month ago

When the parameter is a boolean, IS NULL and IS UNKNOWN are equivalent. According to the SQL standard

The Boolean data type comprises the distinct truth values True and False. Unless prohibited by a NOT NULL constraint, the Boolean data type also supports the truth value Unknown as the null value. This specification does not make a distinction between the null value of the Boolean data type and the truth value Unknown; they may be used interchangeably to mean exactly the same thing.

UNKNOWN is the third truth value. So the difference is that UNKNOWN can only be applied to boolean types. DataFusion and PostgreSQL adhere to this, but DuckDB does not.

postgres=# select 1 is unknown;
ERROR:  argument of IS UNKNOWN must be type boolean, not type integer
LINE 1: select 1 is unknown;
DataFusion CLI v39.0.0
> select 1 is unknown;
type_coercion
caused by
Error during planning: Cannot infer common argument type for comparison operation Int64 IS DISTINCT FROM Boolean

In DuckDB:

D select 1 is unknown;
┌─────────────┐
│ (1 IS NULL) │
│   boolean   │
├─────────────┤
│ false       │
└─────────────┘
findepi commented 1 month ago

Indeed, from SQL 2016 spec

<boolean test> ::=
  <boolean primary> [ IS [ NOT ] <truth value> ]

<truth value> ::=
    TRUE
  | FALSE
  | UNKNOWN

I didn't know about this syntax.

Thanks @alamb @jonahgao . So we don't want to unify. Just make sure the IS [NOT] ( TRUE | FALSE | UNKNOWN ) is applicable to booleans only.

findepi commented 1 month ago

Which in turn means we need those duplicated constructs on the SQL layer, but we could then desugar and replace IS [NOT] UNKNOWN with IS [NOT] NULL at runtime.

alamb commented 1 month ago

I like the idea of desugaring (I think that would map to consolidating some of the Enum variants) 🚀