One wants to build a global time series model from a set of related time series
The target for a single one of these series is constant for every non-missing data point
Although there are many situations where it would make the most sense to exclude such a series from the dataset, there are other situations where it make conceptual or practical sense to include it.
However, the consequence of including a series like this is that calculate_seasonal_error returns seasonal_error=0.00. Although it is accurate to say in this case that seasonal error is equal to zero, the functions that calculate various metrics don't follow through on the implications of this result (i.e. that certain evaluation metrics that require seasonal error to be greater than zero are then not applicable to series that have seasonal_error=0.0).
The metrics calculated in Evaluator that only apply to series with seasonal_error >0 are MSIS and MASE. The functions that calculate these metrics (i.e. msis and mase) do not perform any checks to ensure they are applicable before the calculation proceeds. Instead of returning a value that indicates they are not applicable to a given series, they return a value of inf. These inf entries are then stored in the per-series dataframe produced by get_metrics_per_ts. At best this is misleading.
The bigger issue with inf being returned at the per-series level is that this result propogates forwards into the aggregated metric values returned by get_aggregate_metrics. The 'inf' values in the aggregated metrics for MSIS and MASE are problematic because they communicate only that there is at least one series in the dataset for which MASE and MSIS are not applicable metrics. This discards potentially significant information about model quality for all the series in the dataset for which these two metrics are applicable.
Possible Solution
I would propose that in any of the per-series metric functions that relies on seasonal_error >0 that there should be some kind of a check for seasonal_error = 0.00 that returns nan instead of inf for the per-series level metrics.
That would mean that nan gets stored in the dataframe with per-series metrics. In turn, this would mean that the default aggregation strategy (that ignores nan) would ignore those series in the aggregation of the MASE andMSIS metrics. The aggregation would then communicate information derived from all the series in the dataset where these metrics are applicable. The continued inclusion of seasonal_error in the per-series metric dataframe allows for traceability in that it is easy to identify any series that have a time invariant target to explain their exclusion from certain aggregate metrics.
Alternatively each of the aggregation strategies could be modified to exclude any rows/array entries from the aggregation of MSIS and MASE that are associated with series that have seasonal_error=0.0.
To Reproduce
Take one of the gluonTS datasets and modify one of the series so that 100% of the values are identical to each other. Then train a deepAR estimator, forecast from that model, and evaluate those forecasts (returning item_metrics and agg_metrics dataframes from an instance of Evaluator)
Description
Consider the situation where:
Although there are many situations where it would make the most sense to exclude such a series from the dataset, there are other situations where it make conceptual or practical sense to include it.
However, the consequence of including a series like this is that calculate_seasonal_error returns
seasonal_error=0.00
. Although it is accurate to say in this case that seasonal error is equal to zero, the functions that calculate various metrics don't follow through on the implications of this result (i.e. that certain evaluation metrics that require seasonal error to be greater than zero are then not applicable to series that haveseasonal_error=0.0
).The metrics calculated in
Evaluator
that only apply to series withseasonal_error >0
are MSIS and MASE. The functions that calculate these metrics (i.e. msis and mase) do not perform any checks to ensure they are applicable before the calculation proceeds. Instead of returning a value that indicates they are not applicable to a given series, they return a value ofinf
. Theseinf
entries are then stored in the per-series dataframe produced by get_metrics_per_ts. At best this is misleading.The bigger issue with
inf
being returned at the per-series level is that this result propogates forwards into the aggregated metric values returned by get_aggregate_metrics. The 'inf' values in the aggregated metrics forMSIS
andMASE
are problematic because they communicate only that there is at least one series in the dataset for whichMASE
andMSIS
are not applicable metrics. This discards potentially significant information about model quality for all the series in the dataset for which these two metrics are applicable.Possible Solution
I would propose that in any of the per-series metric functions that relies on
seasonal_error >0
that there should be some kind of a check forseasonal_error = 0.00
that returnsnan
instead ofinf
for the per-series level metrics.That would mean that
nan
gets stored in the dataframe with per-series metrics. In turn, this would mean that the default aggregation strategy (that ignoresnan
) would ignore those series in the aggregation of theMASE
andMSIS
metrics. The aggregation would then communicate information derived from all the series in the dataset where these metrics are applicable. The continued inclusion ofseasonal_error
in the per-series metric dataframe allows for traceability in that it is easy to identify any series that have a time invariant target to explain their exclusion from certain aggregate metrics.Alternatively each of the aggregation strategies could be modified to exclude any rows/array entries from the aggregation of
MSIS
andMASE
that are associated with series that haveseasonal_error=0.0
.To Reproduce
Take one of the gluonTS datasets and modify one of the series so that 100% of the values are identical to each other. Then train a deepAR estimator, forecast from that model, and evaluate those forecasts (returning item_metrics and agg_metrics dataframes from an instance of
Evaluator
)Environment