apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.58k stars 1.04k forks source link

Got "LIMIT must not be negative" error as long as it is not a literal #9506

Closed SteveLauC closed 3 months ago

SteveLauC commented 4 months ago

Describe the bug

$ datafusion-cli --color
DataFusion CLI v36.0.0
❯ select * from '1.parquet' limit 1 + 1;
Error during planning: LIMIT must not be negative
❯

It should work as long as it can be evaluated without schema or data, here is how duckdb handles it:

$ duckdb
v0.10.0 20b1486d11
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D create table foo (id int);
D select * from foo limit 1 + 1;
┌────────┐
│   id   │
│ int32  │
├────────┤
│ 0 rows │
└────────┘

To Reproduce

See above, the table (parquet file) should not matter.

Expected behavior

1 + 1 should be evaluated to a constant 2.

Additional context

Related code:

https://github.com/apache/arrow-datafusion/blob/fc81bf10ea7cde55c6d58a670e15bfd0581ec8c2/datafusion/sql/src/query.rs#L240-L257

SteveLauC commented 4 months ago

So if DataFusion wants to follow DuckDB's behavior, then I guess we should evaluate that SQLExpr rather than simply converting it to DataFusion's Expr type.

jonahgao commented 4 months ago

I think ideally it should support any scalar expression. In PostgreSQL, even functions and scalar subqueries are allowed.

psql=> select * from t limit (select * from t limit 1);
 a
---
 1
(1 row)

psql=> select * from t limit random() + 1;
 a
---
 1
(1 row)
SteveLauC commented 4 months ago

any scalar expression

Yeah, as long as it can be evaluated to a number, it should work.

alamb commented 3 months ago

I think we could run the Expr simplifier / const folder on the limit value

I am not sure we can do so from within the planner itself 🤔

https://github.com/apache/arrow-datafusion/blob/1e4ddb6d86328cb6596bb50da9ccc654f19a83ea/datafusion/sql/src/query.rs#L249C1-L253C1