eakmanrq / sqlframe

Turning PySpark Into a Universal DataFrame API
https://sqlframe.readthedocs.io/en/stable/
MIT License
174 stars 3 forks source link

.show() fails, duckdb engine #88

Closed cristian-marisescu closed 2 weeks ago

cristian-marisescu commented 2 weeks ago

This one is interesting.

Building any thing with F.expr() and calling .show() will fail, even though the generated sql is valid and works

Code

from sqlframe.duckdb import DuckDBDataFrame as DataFrame
from sqlframe.duckdb import DuckDBSession
from sqlframe.duckdb import functions as F

spark = DuckDBSession()

initial: DataFrame = spark.createDataFrame(
    [
        ("data1", None),
        ("data2", "data3"),
    ],
    ["data_col", "another_col"],
)

expr_test: DataFrame = initial.select("*", F.col("data_col").alias("something"))
print(expr_test.sql())
print(expr_test.show())

error

SELECT
  CAST("a1"."data_col" AS TEXT) AS "data_col",
  "a1"."another_col" AS "another_col",
  CAST("a1"."data_col" AS TEXT) AS "something"
FROM (VALUES
  ('data1', NULL),
  ('data2', 'data3')) AS "a1"("data_col", "another_col")
Traceback (most recent call last):
  File "/workspaces/playground.py", line 19, in <module>
    print(expr_test.show())
  File "/workspaces/.venv/lib/python3.10/site-packages/sqlframe/base/dataframe.py", line 1560, in show
    table.add_row(list(row))
  File "/workspaces/.venv/lib/python3.10/site-packages/prettytable/prettytable.py", line 1413, in add_row
    raise ValueError(msg)
ValueError: Row has incorrect number of values, (actual) 4!=3 (expected)

It does the same on every pattern of .select("*", a column expression)

eakmanrq commented 2 weeks ago

This one was a bit tricky but this should be fixed in 1.12.0. Thanks for reporting these issues and please open more if you run into any issues.