Open mhanberg opened 6 months ago
It is not exposed (unless I missed it too!). I think it'd be a great addition. Though it looks like it'd be a good deal of work to add it so it might take a while.
I have a problem that I think can be solved via that API (I'm not entirely sure, still a beginner with Explorer). I can try to think of a version of my problem that I can publicly if needed.
If you want to ask on elixirforum.com, feel free to @- me and I can try to answer. My handle is the same as on GitHub.
Oh, I didn't know we had fold
. It seems it works with expressions, which means we can use the structure in Explorer.QUery to fold over anything and it will be performant. I don't think it would be that complicated then! My suggestion is to call it reduce_with
, to mirror it map_with
and friends!
So it seems there's fold_exprs
and reduce_exprs
. The difference seems to be reduction col-wise vs. row-wise. I think we'd want to include both?
They also have a few exprs pairs like sum
and sum_horizontal
. Maybe we want to call them reduce_with
and reduce_with_horizontal
? reduce
and fold
are basically synonyms to me.
Also looking over the docs, I think there's a lot of potential in exposing many of their exprs:
Sorry, I got fold and reduce mixed up. If it is operating on the columns themselves, then we can probably add it to Explorer.Query directly. We already support column traversal via across/query.
I am more interested in the reduce version that works within a single column.
I am more interested in the reduce version that works within a single column.
Yeah agreed! It'd be super useful in summarise
.
We already support column traversal via across/query.
If I'm reading this correctly (I've not confirmed it yet), then the reduce_with_horizontal
reduces across the columns:
df = DF.new(a: [1, 2, 3], b: [10, 20, 30], c: [100, 200, 300])
+--------------------------------------------+
| Explorer DataFrame: [rows: 3, columns: 3] |
+--------------+--------------+--------------+
| a | b | c |
| <s64> | <s64> | <s64> |
+==============+==============+==============+
| 1 | 10 | 100 |
+--------------+--------------+--------------+
| 2 | 20 | 200 |
+--------------+--------------+--------------+
| 3 | 30 | 300 |
+--------------+--------------+--------------+
mutate(df, sum: reduce_horizontal(cols(), 0, fn col, acc ->
col + acc
end))
+-------------------------------------------+
| Explorer DataFrame: [rows: 3, columns: 4] |
+----------+----------+----------+----------+
| a | b | c | sum |
| <s64> | <s64> | <s64> | <s64> |
+==========+==========+==========+==========+
| 1 | 10 | 100 | 111 |
+----------+----------+----------+----------+
| 2 | 20 | 200 | 222 |
+----------+----------+----------+----------+
| 3 | 30 | 300 | 333 |
+----------+----------+----------+----------+
Our comprehensions only make the same call to mutate
/filter
/etc. with different columns more ergonomic. This would let you actually use compute multi-column things.
In fact, I wonder if we could make the :reduce
option to for
syntactic sugar for this?... 🤔
In fact, I wonder if we could make the :reduce option to for syntactic sugar for this?... 🤔
We certainly could but perhaps @cigrainger has ideas on the API for this. @cigrainger, can we "fold" across columns in dplyr?
The equivalent in dplyr would be accomplished with something like this:
df
|> mutate(sum(c_across(starts_with("Bud")))
It's kind of gross, but quite similar to mutate(df, sum: reduce_horizontal(...))
There used to be a rowwise()
wrapper that also felt a bit off.
Description
Is it possible to expose the folds API from Polars?
I have a problem that I think can be solved via that API (I'm not entirely sure, still a beginner with Explorer).
I can try to think of a version of my problem that I can publicly if needed.
Also, if this API is already exposed and I just missed it... please let me know 😅.
Thanks!