Calling RMAE on a crossvalidation DataFrame with some NaN forecasts RMAE evaluates them to 0.0

Nixtla / utilsforecast

https://nixtlaverse.nixtla.io/utilsforecast

Apache License 2.0

45 stars 8 forks source link

Calling RMAE on a crossvalidation DataFrame with some NaN forecasts RMAE evaluates them to 0.0 #102

Closed concimuscb closed 4 months ago

concimuscb commented 4 months ago

Calling:

baseline_models = ["Naive"]
models = [
    model
    for model in crossvalidation_df.drop(
        columns=["unique_id", "ds", "cutoff", "y"]
    ).columns.tolist()
]

for cutoff in crossvalidation_df["cutoff"].unique():
    cutoff_model_evals = rmae(
        crossvalidation_df[crossvalidation_df["cutoff"] == cutoff],
        models=models,
        baseline_models=list(reversed(models)),
    )

A SeasonalWA is included within the list of models and results in NaNs. After calling RMAE with the code above it can be noticed that columns influenced by SeasonalWA (either as benchmark or benchmarked model) result in a 0.0 evaluation.

concimuscb commented 4 months ago

This is located in the losses file under rmae function.

res[col_name] = (
                res[model].div(_zero_to_nan(res[f"{baseline}_denominator"])).fillna(0)
            )

This is the responsible piece of code and removing fillna should modify the functionality. I am not sure if there are any counterpoints to switching this to NaN but, as it stands, it can lead to an erroneous selection of the best model if all the others have RMAE over 0.

jmoralez commented 4 months ago

In my opinion having NaNs in the baseline is the real issue here.

If we remove the fillna then those entries would be left as NaN and the rmae would be NaN I think. Is that your expected behavior?

concimuscb commented 4 months ago

Ideal behavior in my eyes would be to return an error if the baseline has NaN. I would expect that if someone is deciding that a certain model is the baseline then the baseline can be evaluated for the entire dataset.

Provided that the baseline does not have nan values, if the model that is being baselined is NaN then I would keep it as NaN to prevent erroneous model selection.