Closed ryancurtin closed 2 months ago
Does cond
work?
If not, can you provide an example?
Thanks for your response. cond
may not really be what I want, as it has lots of the same drawbacks as mutate
. I've created a simple example to further elaborate on my use case:
# Stored in DB, can have complicated logic (i.e. multiple conditons, and/or evaluation, etc).
# For simplicity's sake, here are two very basic rules
rules = [
%Rule{column: "b", conditions: [%{column: "a", operator: ">", value: 100}], match_value: "test", result_value: "good"},
%Rule{column: "b", conditions: [%{column: "a", operator: ">", value: 500}], match_value: "good", result_value: "great"},
]
df = DF.new(%{
a: [10, 100, 1000],
b: ["test", "test", "test"]
})
Enum.reduce(rules, df, fn %Rule{column: column, conditions: conditions, match_value: match, result_value: result}, acc_df ->
DF.mutate_with(acc_df, fn ldf ->
# Pseudo-code for what I'd want to happen below - I currently have code to parse and evaluate
# the rules into a boolean Series, but with no way of using it in `mutate_with`.
conditions_passed = Evaluator.evaluate_rules(conditions, ldf)
if conditions_passed do
%{column => Series.replace(ldf[column], match, result)}
else
%{}
end
end)
end)
The conditions for rules don't have any bespoke operations. It'd be just equality checks and comparison operators such as >, <, >=, <=, etc. With that being the case, I was optimistic there is a solution buried somewhere in the existing Explorer API. Please let me know if you'd like any further clarification.
Thanks for the example. The cond
/if
/else
inside mutate
compiles down to Polars when/then/else, so it is not clear to me how they are different or why cond
is not enough. Can you expand your executable code to something we can actually execute, perhaps a Mix.install([:explorer])
script, so we can play with it and try to find a working solution?
Sure thing, I appreciate you looking into this further. I've created a gist with my example: https://gist.github.com/ryancurtin/f6c6c00450528dd8dd9f094f9472e3a3
As I noted in the comments of the code, the biggest issue I have is being able to properly represent conditions in an Explorer.Query
. I tried to demonstrate a reasonably complex rule (an OR comparison between the two of them), but maybe the heart of the problem is if it's possible (or even recommended) to generate a complex query with a dynamic operation (i.e. >, <, >=, <= can be specified with a variable) like that programmatically. Essentially I'd want that Evaluator
module in my example to produce a Query
instead of a Series
.
For your final evaluation -- the TODO
in your gist -- I believe you're looking for Series.select/3
.
Doing if boolean_series do ...
won't work because boolean_series
contains many boolean values. So unless it makes sense in your use case to use Series.all?/1
or Series.any?/1
to collapse the boolean_series
down to a single boolean (and I don't think it does), you'll need to let select
decide which value to take on a row-by-row basis.
As for the confusion around how to iteratively build up a Explorer.Query
based on multiple RuleObject
s: I'd say there isn't a "elegant" way to do what you're trying to accomplish right now (if I've understood it). Today, our Explorer.Query
construct works based off callbacks. So your reducer would need to iteratively wrap the accumulated callback in another callback. Something like this:
lazy_fun =
Enum.reduce(rules, fn x -> x end, fn rule, acc_fun ->
rule_fun = turn_rule_into_fun(rule)
fn x -> x |> acc_fun.() |> rule_fun.() end
end)
DF.mutate_with(df, lazy_fun)
Final thought: your gist was a fairly long and I admit I only skimmed it. In future, try simplifying your examples as much as possible. Minimal working examples (MWEs) are extremely valuable. ATM, I'm still not sure if I fully understood what you goal is and have made my best guess.
@billylanchantin - I can confirm that Series.select/3
is exactly what I want! I was trying to perform string replacement on an individual element in a Series
only if a bunch of boolean logic (based on either a scalar value or an element in another Series
) evaluated to true. Series.select/3
seems to give me that in spades. I'm not sure how I overlooked that, though I guess the name of the function is fairly generic at first glance.
Thanks for confirming my interpretation of query, and that what I was looking to do is also possible building up functions.
RE: Examples - definitely noted for the future. I'd say my first code snippet had most of the relevant code and I hadn't quite conveyed enough about what I was trying to do. The gist just had context around the evaluation of rules and thus wasn't entirely necessary to understand the problem (skeleton functions with typespecs probably would have been enough).
Glad to hear it has been addressed!
Btw, regarding the examples/gist, I think if the first example was executable, it would have been perfect in size. :) But I am glad we could all build on top of each other's feedback.
I was also able to make this work:
require Explorer.DataFrame, as: DF
rules = [
%{
column: "b",
conditions: [%{column: "a", operator: "greater", value: 100}],
match_value: "test",
result_value: "good"
},
%{
column: "b",
conditions: [%{column: "a", operator: "greater", value: 500}],
match_value: "good",
result_value: "great"
}
]
df =
DF.new(%{
a: [10, 100, 1000],
b: ["test", "test", "test"]
})
defmodule Evaluator do
def evaluate(df, conditions) do
conditions
|> Enum.map(fn %{column: column, operator: op, value: value} ->
apply(Explorer.Series, String.to_existing_atom(op), [df[column], value])
end)
|> Enum.reduce(&Explorer.Series.and/2)
end
end
Enum.reduce(rules, df, fn %{
column: column,
conditions: conditions,
match_value: match,
result_value: result
},
acc_df ->
DF.mutate(
acc_df,
[
{^column,
if Evaluator.evaluate(df(), ^conditions) do
select(col(^column) == ^match, ^result, col(^column))
else
col(^column)
end}
]
)
end)
|> IO.inspect()
But it requires pushing a df()
macro to Explorer.Query, which I will do soon. But @billylanchantin nailed it, you don't need the conditional above, you could instead write it as select(Series.and(condition, df[column] == match))
and that will likely be clearer anyway.
@josevalim - Appreciate the detailed clarification! The df()
macro was the construct I was missing, though I'm not sure I would have been able to properly articulate that. Knowing this is all possible with mutate (and your response about compiling down to when/then/else) helps a lot!
My use case involves mutating data in a particular column, but only if a certain condition matches. Mutate supports this functionality now, but since my rules can come from an external data source, I'd need to do this programmatically. Polars has the when function for this operation.
I'd like to avoid creating an intermediate column to evaluate my conditions first or using
Series.transform
so I was wondering if there's a recommended approach. I'm familiar with the library and I have combed through the docs quite a bit, but it wouldn't be the first time I've overlooked something obvious.