Simple python strings that are passed and will be converted by pyspark to lit within a column, are mistaken for actual column names in duckdb engine:
Take the following code
from sqlframe.duckdb import DuckDBSession
from sqlframe.duckdb import functions as F
spark = DuckDBSession()
df = spark.createDataFrame([(2, "Alice"), (5, "Bob")], ["age", "name"])
df = df.withColumn(
"python_string_to_pyspark_string",
F.when(F.col("name") == "Alice", "Something").otherwise("Default"),
)
print(df.sql())
print(df.show())
This would fail with
File "/workspaces/playground.py", line 17, in <module>
print(df.sql())
File "/workspaces/.venv/lib/python3.10/site-packages/sqlframe/base/dataframe.py", line 588, in sql
exp.Select, self.session._optimize(select_expression, dialect=dialect)
File "/workspaces/.venv/lib/python3.10/site-packages/sqlframe/base/session.py", line 422, in _optimize
return optimize(expression, dialect=dialect, schema=self.catalog._schema)
File "/workspaces/.venv/lib/python3.10/site-packages/sqlglot/optimizer/optimizer.py", line 92, in optimize
optimized = rule(optimized, **rule_kwargs)
File "/workspaces/.venv/lib/python3.10/site-packages/sqlglot/optimizer/qualify.py", line 90, in qualify
validate_qualify_columns_func(expression)
File "/workspaces/.venv/lib/python3.10/site-packages/sqlglot/optimizer/qualify_columns.py", line 102, in validate_qualify_columns
raise OptimizeError(f"Column '{column}' could not be resolved{for_table}")
sqlglot.errors.OptimizeError: Column '"alice"' could not be resolved
That's because the== "Alice" is interpreted as column name, not lit of a column.
Pyspark will automatically convert the == "Alice" to == F.lit("Alice") and specifying this would work with duckdb as well.
Hi,
Simple python strings that are passed and will be converted by pyspark to
lit
within a column, are mistaken for actual column names in duckdb engine:Take the following code
This would fail with
That's because the
== "Alice"
is interpreted as column name, not lit of a column.Pyspark will automatically convert the
== "Alice"
to== F.lit("Alice")
and specifying this would work with duckdb as well.What do you think?