JoaquinAmatRodrigo / skforecast

Time series forecasting with machine learning models
https://skforecast.org
BSD 3-Clause "New" or "Revised" License
996 stars 113 forks source link

Feature request: Allow the training set to be passed to custom error metrics #651

Open KishManani opened 4 months ago

KishManani commented 4 months ago

Error metrics such as the MASE and RMSSE require computing scaling factors from the target variable in the training set. Currently there is no way to compute these metrics. It would be nice functionality to allow the computation of these or for these metrics to be included by default.

JavierEscobarOrtiz commented 4 months ago

Hello @KishManani

Thanks for opening the issue! It is a great point to give the users more flexibility.

I wonder if you will be able to do this using a ForecasterEquivalentDate + a backtesting with a custom metric 🤔 (But it is true that this is more labor-intensive for the user)

What do you think @JoaquinAmatRodrigo ?

https://skforecast.org/latest/user_guides/forecasting-baseline

https://skforecast.org/latest/user_guides/backtesting#backtesting-with-custom-metric

KishManani commented 3 months ago

Hi @JavierEscobarOrtiz !

I wonder if you will be able to do this using a ForecasterEquivalentDate + a backtesting with a custom metric 🤔 (But it is true that this is more labor-intensive for the user)

I agree that this would be very laborious for a user for what is a relatively common error metric.

I also want to add that it would be nice to compute error metrics which are pooled over multiple time series. For example the normalised deviation and normalised RMSE. See the definitions here:

https://arxiv.org/pdf/1704.04110.pdf

image

These metrics are recommended in this review paper: https://link.springer.com/article/10.1007/s10618-022-00894-5

So it would be nice to be able to compute them.

Thank you! Kishan

JoaquinAmatRodrigo commented 2 weeks ago

Hi @KishManani We are planning to provide these features in the next release. For that, we will probably generalize the calculation of the metrics to use a function that takes 4 arguments: y_pred, y_real (mandatory) and y_pred_train, y_real_train (optional).

There is one corner case that we would like to discuss further. When using backtesting with refit strategy, let's say n refits, there are n groups od of in-sample predictions (some of them may overlap depending on the refit strategy). Do you see any inconvenience in pooling them all?

For multi-series models, we already pool the metric across all selected series using a weighted average, where the weight is the length of the predicted values for each series.

KishManani commented 1 week ago

Hi @JoaquinAmatRodrigo!

We are planning to provide these features in the next release. For that, we will probably generalize the calculation of the metrics to use a function that takes 4 arguments: y_pred, y_real (mandatory) and y_pred_train, y_real_train (optional).

Sounds good! Just fyi, in my opinion, I think real is not a good term to use here. My suggestion would be to use y_true to make it more similar to sklearn (this reduces some cognitive load on people familiar with sklearn).

There is one corner case that we would like to discuss further. When using backtesting with refit strategy, let's say n refits, there are n groups od of in-sample predictions (some of them may overlap depending on the refit strategy). Do you see any inconvenience in pooling them all?

My understanding is that for each backtest step (not neccesarily each refit as we might be refitting intermittently but could still compute the error metric for those folds where we did not refit) we are going to compute some part of our error metrics on the data prior to the forceast horizon (e.g., for MASE we compute the MAE of a 1-step forecasting model -- not the insample errors of the fitted model -- on the training data) and then use this to rescale the metric in the forecast horizon (e.g., divide the MAE in the forecast horizon by the MAE of a 1-step forecasting model on the data prior to the forecast horizon).

I don't understand your question in this context? Could you give a specific example? Thank you!

For multi-series models, we already pool the metric across all selected series using a weighted average, where the weight is the length of the predicted values for each series.

This does not reproduce the NRSME and ND metrics above right?