creates a third column, instead of replacing the existing one:
SELECT
CAST("a1"."name" AS TEXT) AS "name",
CAST("a1"."age" AS BIGINT) AS "age",
CAST("a2"."height" AS BIGINT) AS "height",
UPPER(CAST("a1"."name" AS TEXT)) AS "name"
FROM (VALUES
(2, 'Alice'),
(5, 'Bob')) AS "a1"("age", "name")
LEFT JOIN (VALUES
(170, 'Alice'),
(180, 'Bob')) AS "a2"("height", "name")
ON CAST("a1"."name" AS TEXT) = CAST("a2"."name" AS TEXT)
+-------+-----+--------+--------+
| name | age | height | name_3 |
+-------+-----+--------+--------+
| Alice | 2 | 170 | Alice |
| Bob | 5 | 180 | Bob |
+-------+-----+--------+--------+
Seems that generating sql is building columns as they come.
Wouldn't switching the algorithm to build CTE be preferable (maybe starting with just joins) ? In my mind, the model would be actually each pyspark dataframe action = one CTE
I see resolving a lot of edge cases + generating cleaner sql in different scenarios, but this is just a first instinct.
Something like
WITH joined as(
SELECT
CAST("a1"."name" AS TEXT) AS "name",
CAST("a1"."age" AS BIGINT) AS "age",
CAST("a2"."height" AS BIGINT) AS "height",
FROM (VALUES
(2, 'Alice'),
(5, 'Bob')) AS "a1"("age", "name")
LEFT JOIN (VALUES
(170, 'Alice'),
(180, 'Bob')) AS "a2"("height", "name")
ON CAST("a1"."name" AS TEXT) = CAST("a2"."name" AS TEXT))
SELECT UPPER(CAST("name" AS TEXT)) AS "name", 'rest_of_columns_that_werent_changed' FROM joined
Hi, Writing the following
creates a third column, instead of replacing the existing one:
I would've expected same behavior as in pyspark:
Seems that generating sql is building columns as they come.
Wouldn't switching the algorithm to build CTE be preferable (maybe starting with just joins) ? In my mind, the model would be actually each pyspark dataframe action = one CTE
I see resolving a lot of edge cases + generating cleaner sql in different scenarios, but this is just a first instinct.
Something like