duckdb engine, .withColumn after a join doesn't replace the key column, but creates a new one

Hi, Writing the following

import sqlframe.duckdb.functions as F
from sqlframe.duckdb import DuckDBSession

spark = DuckDBSession()

df = spark.createDataFrame([(2, "Alice"), (5, "Bob")], ["age", "name"])
height = spark.createDataFrame([(170, "Alice"), (180, "Bob")], ["height", "name"])

df = df.join(height, how="left", on="name").withColumn("name", F.upper("name"))

print(df.sql())
print(df.show())

creates a third column, instead of replacing the existing one:

SELECT
  CAST("a1"."name" AS TEXT) AS "name",
  CAST("a1"."age" AS BIGINT) AS "age",
  CAST("a2"."height" AS BIGINT) AS "height",
  UPPER(CAST("a1"."name" AS TEXT)) AS "name"
FROM (VALUES
  (2, 'Alice'),
  (5, 'Bob')) AS "a1"("age", "name")
LEFT JOIN (VALUES
  (170, 'Alice'),
  (180, 'Bob')) AS "a2"("height", "name")
  ON CAST("a1"."name" AS TEXT) = CAST("a2"."name" AS TEXT)
+-------+-----+--------+--------+
|  name | age | height | name_3 |
+-------+-----+--------+--------+
| Alice |  2  |  170   | Alice  |
|  Bob  |  5  |  180   |  Bob   |
+-------+-----+--------+--------+

I would've expected same behavior as in pyspark:

| name|age|height|
+-----+---+------+
|ALICE|  2|   170|
|  BOB|  5|   180|
+-----+---+------+

Seems that generating sql is building columns as they come.

Wouldn't switching the algorithm to build CTE be preferable (maybe starting with just joins) ? In my mind, the model would be actually each pyspark dataframe action = one CTE

I see resolving a lot of edge cases + generating cleaner sql in different scenarios, but this is just a first instinct.

Something like

WITH joined as(
SELECT
  CAST("a1"."name" AS TEXT) AS "name",
  CAST("a1"."age" AS BIGINT) AS "age",
  CAST("a2"."height" AS BIGINT) AS "height",
FROM (VALUES
  (2, 'Alice'),
  (5, 'Bob')) AS "a1"("age", "name")
LEFT JOIN (VALUES
  (170, 'Alice'),
  (180, 'Bob')) AS "a2"("height", "name")
  ON CAST("a1"."name" AS TEXT) = CAST("a2"."name" AS TEXT))
SELECT UPPER(CAST("name" AS TEXT)) AS "name", 'rest_of_columns_that_werent_changed' FROM joined

eakmanrq / sqlframe

duckdb engine, .withColumn after a join doesn't replace the key column, but creates a new one #102