Open lostbean opened 2 months ago
The best option is most likely to massage the data frame you are joining before hand, by creating the collapsed column, and then joining in that.
@cigrainger is this something you want to support?
In many cases it is best to massage the dataframe, but this is a legit usecase and there's a lot to be said for expressions in the on arg. dplyr
offers join_by
. Generally, their joining game is much more advanced than ours and it's something we should be considering. I'm not a huge fan of where I ended up originally in the tuple list approach.
AFAIK, this is also not supported on polars python API
and back in days when we needed it we were falling back one of the two options below:
polars-sql
if it's a simple queryduckdb
and back to polars: df()
(not a big deal cause there is a zero-copy intgration between the two. )So in this case I think we can utilize and enhance the sql
method we have in Explorer.DataFrame
.
single df
and single table_name
to register
.lf_sql (sql: string, registry: [(df, table_name), ...])
Am I correct?pub fn lf_sql(
lf: ExLazyFrame,
sql_string: &str,
table_name: &str,
) -> Result<ExLazyFrame, ExplorerError> {
Description:
I would like to propose a new feature in Elixir Explorer that allows the
on
option in thejoin
function to support more complex expressions. Currently, Explorer provides functionality for join operations with a limited ability to specify joins using simple equality checks (e.g.,on: [{"column1", "column2"}]
). To enhance the flexibility of joining tables based on complex conditions that cannot be easily expressed with only column names, it would be beneficial to extend the joinon
clause to accept Explorer expressions.Example Use Case:
Consider the following SQL query, where a complex condition is used in the
ON
clause to perform a left join:Currently, to achieve this behavior in Explorer, one has to perform two joins, as shown below:
Proposed Enhancement:
I propose that the
on
option be enhanced to allow expressions, making it possible to perform complex joins more succinctly. Ideally, the code would look something like this: