Open eredzik opened 1 month ago
What is interesting is your workaround with the alias is still not correct. That is because there are duplicate store_ids and in duckdb it is picking the first one. So Spark is automatically dealing with the ambiguity here. It seems like in the first example the "store_id" column from "store" is used instead of employee. This seems to break a general rule that exists which is that columns are resolve left -> right which is likely because the column was used in the join.
This is a bit trickier to solve since I need to better understand how Spark's handles this kind of ambiguity in order to properly replicate it. Thanks for pointing this out!
Throws error:
Possible changes that resolve issue
If I run
.alias()
after.select()
then table seem to be realiased and below query works just fine: