`backtest.backtest_metrics` produces results with the wrong type of `item_id`

dmitra79 commented 2 years ago

Description

When item_id for data instances are integers or strings representing integers, backtest.backtest-metrics() produces an item_metrics data frame where item_id is a float (see code below). This causes a problem when trying to join the metrics with other data frames representing the instances, since ids no longer match, ie. '1' != 1.0.

While this is easy to fix after the call, ex:

 if item_metrics.item_id.dtypes=='float64':
        item_metrics['item_id']=item_metrics['item_id'].astype(int).astype(str)

this is un-intuitive and requires time and effort to catch.

At the same time, if item_id are non-numeric strings, there is no issue.

To Reproduce

import pandas as pd
import numpy as np
import json
from gluonts.evaluation import Evaluator
from gluonts.evaluation import backtest
from gluonts.dataset.common import ListDataset, TrainDatasets, MetaData
from gluonts.evaluation import make_evaluation_predictions

# from gluonTS tutorial: https://ts.gluon.ai/tutorials/forecasting/extended_tutorial.html#Training-an-existing-model
# create data
N = 10  # number of time series
T = 24*7  # number of timesteps
prediction_length = 24
freq = "1H"
custom_dataset = np.random.normal(size=(N, T))
start = pd.Timestamp("01-01-2019", freq=freq)  # can be different for each time series

# train dataset: cut the last window of length "prediction_length", add "target" and "start" fields
train_ds = ListDataset(
    [{'item_id': str(i),'target': x, 'start': start} for (i,x) in enumerate(custom_dataset[:, :-prediction_length])],
    freq=freq
)
# test dataset: use the whole dataset, add "target" and "start" fields
test_ds = ListDataset(
    [{'item_id': str(i),'target': x, 'start': start} for (i,x) in enumerate(custom_dataset[:, :])],
    freq=freq
)

custom_ds_metadata = MetaData(target={'name': 'target'},
                            time_granularity=freq,
                            prediction_length=prediction_length)

dataset=TrainDatasets(custom_ds_metadata, train_ds, test_ds)

from gluonts.model.simple_feedforward import SimpleFeedForwardEstimator
from gluonts.mx import Trainer

estimator = SimpleFeedForwardEstimator(
    num_hidden_dimensions=[10],
    prediction_length=dataset.metadata.prediction_length,
    context_length=100,
    freq=dataset.metadata.freq,
    trainer=Trainer(
        ctx="cpu",
        epochs=2,
        learning_rate=1e-3,
        num_batches_per_epoch=100,
    )
)
predictor = estimator.train(train_ds,test_ds)

evaluator=Evaluator(quantiles=[0.1, 0.5, 0.9], num_workers=None)
back_agg_metrics, back_item_metrics = backtest.backtest_metrics(test_ds, predictor, evaluator)
print(back_item_metrics.item_id.dtype)
back_item_metrics[['item_id', 'MSE']].head()

Error message or code output

The type of the item_id column is float, even though item_id were strings

float64

  | item_id | MSE
-- | -- | --
0.0 | 1.958736
1.0 | 1.686057
2.0 | 0.832462
3.0 | 0.455663
4.0 | 1.406261

Environment

Operating system: linux
Python version: 3.7.10
GluonTS version: 0.9.4
MXNet version: 1.7.0

(Add as much information about your environment as possible, e.g. dependencies versions.)

jaheba commented 2 years ago

Without having looked at this too closely, this seems to be the culprit:

https://github.com/awslabs/gluon-ts/blob/73968889531af054e5c5b57b833dc11b1929233d/src/gluonts/evaluation/_base.py#L247-L252

This forces the float64 dtype on all columns. We should ensure that item_id is not part of it.

However, the Evaluator is something I want to see generally reworked in the future. I don't think we should return a DataFrame at all. I've done some work on this in #1778.

lostella commented 2 years ago

Related to #2100

lostella commented 2 years ago

Fixed in #2183

awslabs / gluonts