Nixtla / utilsforecast

https://nixtlaverse.nixtla.io/utilsforecast
Apache License 2.0
35 stars 6 forks source link

fix rmae #103

Closed concimuscb closed 4 days ago

concimuscb commented 2 weeks ago

Fix to #101

https://github.com/Nixtla/utilsforecast/issues/101

CLAassistant commented 2 weeks ago

CLA assistant check
All committers have signed the CLA.

jmoralez commented 2 weeks ago

Hey. Thanks for your contribution. I had fixed this in a separate branch by doing this:

diff --git a/utilsforecast/losses.py b/utilsforecast/losses.py
index 1c6ec4b..45864f8 100644
--- a/utilsforecast/losses.py
+++ b/utilsforecast/losses.py
@@ -327,16 +327,15 @@ def rmae(
     """
     numerator = mae(df, models, id_col, target_col)
     denominator = mae(df, baseline_models, id_col, target_col)
+    denom_rename = {m: f"{m}_denominator" for m in baseline_models}
+    denominator = ufp.rename(denominator, denom_rename)
+    res = ufp.join(numerator, denominator, on=id_col)
     if isinstance(numerator, pd.DataFrame):
-        res = numerator.merge(denominator, on=id_col, suffixes=("", "_denominator"))
-        out_cols = [id_col]
         for model, baseline in zip(models, baseline_models):
             col_name = f"{model}_div_{baseline}"
             res[col_name] = (
                 res[model].div(_zero_to_nan(res[f"{baseline}_denominator"])).fillna(0)
             )
-            out_cols.append(col_name)
-        res = res[out_cols]
     else:

         def gen_expr(model, baseline):
@@ -348,10 +347,11 @@ def rmae(
                 .alias(f"{model}_div_{baseline}")
             )

-        res = numerator.join(denominator, on=id_col, suffix="_denominator")
         exprs = [gen_expr(m1, m2) for m1, m2 in zip(models, baseline_models)]
-        res = res.select([id_col, *exprs])
-    return res
+    model_cols = [
+        f"{model}_div_{baseline}" for model, baseline in zip(models, baseline_models)
+    ]
+    return res[[id_col] + model_cols]

Can you please do that instead?

About the single baseline, I think we can support the baseline model being a string and repeating that string for each model. That maintains the current behavior and also allows that use case, e.g. rmae(series, models=['model1', 'model2'], baseline_models='model0')

jmoralez commented 2 weeks ago

On a second thought, I've never liked the output of the rmae, it's very strange when used in evaluate. I think this is a good opportunity to change the baseline model to be a single string and use it for every model. That way, when used with evaluate it won't add extra columns. It'll still require a change in the evaluate function to append the baseline name to the metric, e.g. rmae_naive or similar, but I can do that in a followup PR.

concimuscb commented 2 weeks ago

Both approaches seem good to me.

As long as the most common scenario (baselining several models against one) is easy to do then I think it makes sense. Furthermore, if the baseline model is among the baselined models, then 1.0 should be returned as an evaluation.

Should I close this PR then?

jmoralez commented 5 days ago

If you want to incorporate the suggested changes in https://github.com/Nixtla/utilsforecast/pull/103#issuecomment-2209339010 please go ahead, otherwise you can close it and I'll make them in a separate PR.

jmoralez commented 4 days ago

Superseded by #105