> > @kevinzwang wouldn't this be better to handle directly in the logical plan? That would allow us to support non equi joins in the dataframe api as well.
I agree. However, we would potentially need to have a concept of table-associated columns in our logical plan as well as make changes to our join op struct.
I think we should definitely do it in the future, but I didn't want to broaden the scope of this PR. Additionally, I think we would still need to have SQL-specific logic to identify the join keys for each side in a query, so a lot of the work in this PR is still relevant.
Let me know what you think!
I think we would still need to have SQL-specific logic to identify the join keys for each side in a query, so a lot of the work in this PR is still relevant.
that sounds good for now. I think an interesting dataframe case is df1.join(df2, on=(df1["a"] == df2["a"] & df2["b"] > 0)). How would the dataframe side identify df1.a vs df2.a? Wouldn't we need similar logic to handle this case?
Additionally, how would you express this using the dsl col syntax instead of bracket notation. col("df1.a")? or df1.col("a")
I think we should definitely do it in the future, but I didn't want to broaden the scope of this PR.
I'll open up an issue for dataframe non-equi joins just so we don't lose this!
that sounds good for now. I think an interesting dataframe case is
df1.join(df2, on=(df1["a"] == df2["a"] & df2["b"] > 0))
. How would the dataframe side identify df1.a vs df2.a? Wouldn't we need similar logic to handle this case?Additionally, how would you express this using the dsl
col
syntax instead of bracket notation.col("df1.a")
? ordf1.col("a")
I'll open up an issue for dataframe non-equi joins just so we don't lose this!
Originally posted by @universalmind303 in https://github.com/Eventual-Inc/Daft/issues/3371#issuecomment-2489903359