Closed cpcloud closed 10 months ago
Hi @cpcloud. I am unsure about this. We throw this error because when binding the RHS of the lambda function (x + 1
in your case), it is possible to use other columns of the table, which possibly have the same name as the lambda parameter. The binder error is unrelated to the input list. For example, the following query also triggers the binder error.
create or replace table t as select [1, 2, 3] as x;
select list_apply(['hello'], x -> x) from t;
Error: Binder Error: Invalid parameter name 't.x': must be unqualified
If we allow column names as lambda parameters, the following query would be ambiguous.
create or replace table t as select [[1], [2], [3]] as x;
select list_transform([[1], [2], [3]], x -> x[1]) from t;
Error: Binder Error: Invalid parameter name 't.x': must be unqualified
# possible result: x is a lambda parameter
[1, 2, 3]
# possible result: x is a column name in t
[[1], [1], [1]]
Using the column as the input list is possible, i.e., the following query works.
D create or replace table t as select [1, 2, 3] as x;
D select list_apply(t.x, y -> y + 1) from t;
┌─────────────────────────────────┐
│ list_apply(t.x, (y -> (y + 1))) │
│ int32[] │
├─────────────────────────────────┤
│ [2, 3, 4] │
└─────────────────────────────────┘
I can look into making the error message more clear in this scenario, but I don't see a way to address the possible ambiguity between lambda parameter names and column names in the RHS of the lambda expression.
@taniabogatsch Thanks for looking into it.
I'm not sure I understand the ambiguity here, as long as there are well-defined scoping rules for lambdas.
create or replace table t as select [[1], [2], [3]] as x;
select list_transform([[1], [2], [3]], x -> x[1]) from t;
Error: Binder Error: Invalid parameter name 't.x': must be unqualified
Why would x
be ambiguous here? It seems to me that this is unambiguously referring to the x
argument of the lambda function and couldn't possibly ever be t.x
without the table identifier qualifying it. I would expect lexical scoping here:
x
is first looked up as a local variablex
is looked up as a columnx
Other systems that implement equivalent functionality such as ClickHouse and Trino do not have this problem.
ClickHouse:
❯ clickhouse local
ClickHouse local version 23.11.1.1.
localhost :) create view t as select [1, 2, 3] x;
CREATE VIEW t AS
SELECT [1, 2, 3] AS x
Query id: 8c03f649-7cc9-4619-b141-765b9ab17ef7
Ok.
0 rows in set. Elapsed: 0.000 sec.
localhost :) select arrayMap(x -> x + 1, x) from t
SELECT arrayMap(x -> (x + 1), x)
FROM t
Query id: ca1b3c99-f996-480b-b445-98777383a5ef
┌─arrayMap(lambda(tuple(x), plus(x, 1)), x)─┐
│ [2,3,4] │
└───────────────────────────────────────────┘
1 row in set. Elapsed: 0.001 sec.
Trino:
❯ trino --catalog memory --schema default
trino:default> create view t as select array[1, 2, 3] as x;
CREATE VIEW
trino:default> select transform(x, x -> x + 1) from t;
_col0
-----------
[2, 3, 4]
(1 row)
Query 20231218_115821_00236_x3bsy, FINISHED, 1 node
Splits: 1 total, 1 done (100.00%)
0.10 [0 rows, 0B] [0 rows/s, 0B/s]
We bind lambda functions with our SQL scalar function binder, which does not have a concept of local variables that are not columns. I.e., the binder treats lambda parameters the same as it treats columns. That's also why we recently pushed this PR...
and couldn't possibly ever be t.x without the table identifier qualifying it
SQL scalar functions can reference column names without table identifiers, which should not change for lambda functions.
But after looking into this more now, I agree that adding scoping here for lambda functions, which are already a particular case, can make sense and make the lambda functions more intuitive to use. I should be able to change the lambda binding to prioritize lambda parameter names over column names and inner lambda parameter names over outer lambda parameter names, like so: inner lambda > outer lambda > column
.
PR got merged.
Much appreciated!
What happens?
Using a parameter name in a lambda functions that matches the column name of the input column results in a strange error message about needing an unqualified parameter name
To Reproduce
Since the
x
s are unrelated to each other here I would expect there to be no issue executing this query.OS:
Linux (NixOS)
DuckDB Version:
v0.9.2
DuckDB Client:
CLI
Full Name:
Phillip Cloud
Affiliation:
Voltron Data
Have you tried this on the latest
main
branch?I have tested with a main build
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?