awslabs / gluonts

Probabilistic time series modeling in Python
https://ts.gluon.ai
Apache License 2.0
4.65k stars 755 forks source link

get_aggregate_metrics() returns `inf` for MSIS and MASE when just a single target series in the dataset is invariant across time #3097

Open Serendipity31 opened 11 months ago

Serendipity31 commented 11 months ago

Description

Consider the situation where:

Although there are many situations where it would make the most sense to exclude such a series from the dataset, there are other situations where it make conceptual or practical sense to include it.

However, the consequence of including a series like this is that calculate_seasonal_error returns seasonal_error=0.00. Although it is accurate to say in this case that seasonal error is equal to zero, the functions that calculate various metrics don't follow through on the implications of this result (i.e. that certain evaluation metrics that require seasonal error to be greater than zero are then not applicable to series that have seasonal_error=0.0).

The metrics calculated in Evaluator that only apply to series with seasonal_error >0 are MSIS and MASE. The functions that calculate these metrics (i.e. msis and mase) do not perform any checks to ensure they are applicable before the calculation proceeds. Instead of returning a value that indicates they are not applicable to a given series, they return a value of inf. These inf entries are then stored in the per-series dataframe produced by get_metrics_per_ts. At best this is misleading.

The bigger issue with inf being returned at the per-series level is that this result propogates forwards into the aggregated metric values returned by get_aggregate_metrics. The 'inf' values in the aggregated metrics for MSIS and MASE are problematic because they communicate only that there is at least one series in the dataset for which MASE and MSIS are not applicable metrics. This discards potentially significant information about model quality for all the series in the dataset for which these two metrics are applicable.

Possible Solution

I would propose that in any of the per-series metric functions that relies on seasonal_error >0 that there should be some kind of a check for seasonal_error = 0.00 that returns nan instead of inf for the per-series level metrics.

That would mean that nan gets stored in the dataframe with per-series metrics. In turn, this would mean that the default aggregation strategy (that ignores nan) would ignore those series in the aggregation of the MASE andMSIS metrics. The aggregation would then communicate information derived from all the series in the dataset where these metrics are applicable. The continued inclusion of seasonal_error in the per-series metric dataframe allows for traceability in that it is easy to identify any series that have a time invariant target to explain their exclusion from certain aggregate metrics.

Alternatively each of the aggregation strategies could be modified to exclude any rows/array entries from the aggregation of MSIS and MASE that are associated with series that have seasonal_error=0.0.

To Reproduce

Take one of the gluonTS datasets and modify one of the series so that 100% of the values are identical to each other. Then train a deepAR estimator, forecast from that model, and evaluate those forecasts (returning item_metrics and agg_metrics dataframes from an instance of Evaluator)

Environment