awslabs / gluonts

Probabilistic time series modeling in Python
https://ts.gluon.ai
Apache License 2.0
4.64k stars 755 forks source link

Inference Single Item on model trained on Multiple Items #3128

Open Alex-Wenner-FHR opened 9 months ago

Alex-Wenner-FHR commented 9 months ago

I am using:

I have a TemporalFusionTransformer that was trained with a PandasDataset.from_long_dataframe(...). In this PandasDataset I have multiple item_ids

|item_id| ... 
|-------|
|cat1   |
|cat2   |
|cat3...|

This dataset includes several past_feat_dynamic_reals and a few static_features.

I want to predict on just one category. However when I do something like


df = df.loc[df['item_id'] == 'cat1']
sample_group = PandasDataset.from_long_dataframe(df, **same_dataset_spec_used_for_training)
forecasts = model.predict(dataset = sample_group)
next(iter(forecasts))

I get the following error:

IndexError                                Traceback (most recent call last)
Cell In[124], line 9
      7 model = Pred.deserialize(pathlib.Path(\"./model\"))
      8 forecasts = model.predict(dataset = sample_group)
----> 9 next(iter(forecasts))

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/torch/model/predictor.py:90, in PyTorchPredictor.predict(self, dataset, num_samples)
     87 self.prediction_net.eval()
     89 with torch.no_grad():
---> 90     yield from self.forecast_generator(
     91         inference_data_loader=inference_data_loader,
     92         prediction_net=self.prediction_net,
     93         input_names=self.input_names,
     94         output_transform=self.output_transform,
     95         num_samples=num_samples,
     96     )

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/model/forecast_generator.py:117, in QuantileForecastGenerator.__call__(self, inference_data_loader, prediction_net, input_names, output_transform, num_samples, **kwargs)
    108 def __call__(
    109     self,
    110     inference_data_loader: DataLoader,
   (...)
    115     **kwargs
    116 ) -> Iterator[Forecast]:
--> 117     for batch in inference_data_loader:
    118         inputs = select(input_names, batch, ignore_missing=True)
    119         outputs = predict_to_numpy(prediction_net, inputs)

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/transform/_base.py:111, in TransformedDataset.__iter__(self)
    110 def __iter__(self) -> Iterator[DataEntry]:
--> 111     yield from self.transformation(
    112         self.base_dataset, is_train=self.is_train
    113     )

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/transform/_base.py:132, in MapTransformation.__call__(self, data_it, is_train)
    129 def __call__(
    130     self, data_it: Iterable[DataEntry], is_train: bool
    131 ) -> Iterator:
--> 132     for data_entry in data_it:
    133         try:
    134             yield self.map_transform(data_entry.copy(), is_train)

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/dataset/loader.py:50, in Batch.__call__(self, data, is_train)
     49 def __call__(self, data, is_train):
---> 50     yield from batcher(data, self.batch_size)

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/itertools.py:131, in batcher.<locals>.get_batch()
    130 def get_batch():
--> 131     return list(itertools.islice(it, batch_size))

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/transform/_base.py:132, in MapTransformation.__call__(self, data_it, is_train)
    129 def __call__(
    130     self, data_it: Iterable[DataEntry], is_train: bool
    131 ) -> Iterator:
--> 132     for data_entry in data_it:
    133         try:
    134             yield self.map_transform(data_entry.copy(), is_train)

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/transform/_base.py:186, in FlatMapTransformation.__call__(self, data_it, is_train)
    182 def __call__(
    183     self, data_it: Iterable[DataEntry], is_train: bool
    184 ) -> Iterator:
    185     num_idle_transforms = 0
--> 186     for data_entry in data_it:
    187         num_idle_transforms += 1
    188         for result in self.flatmap_transform(data_entry.copy(), is_train):

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/transform/_base.py:132, in MapTransformation.__call__(self, data_it, is_train)
    129 def __call__(
    130     self, data_it: Iterable[DataEntry], is_train: bool
    131 ) -> Iterator:
--> 132     for data_entry in data_it:
    133         try:
    134             yield self.map_transform(data_entry.copy(), is_train)

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/transform/_base.py:132, in MapTransformation.__call__(self, data_it, is_train)
    129 def __call__(
    130     self, data_it: Iterable[DataEntry], is_train: bool
    131 ) -> Iterator:
--> 132     for data_entry in data_it:
    133         try:
    134             yield self.map_transform(data_entry.copy(), is_train)

    [... skipping similar frames: MapTransformation.__call__ at line 132 (5 times)]

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/transform/_base.py:132, in MapTransformation.__call__(self, data_it, is_train)
    129 def __call__(
    130     self, data_it: Iterable[DataEntry], is_train: bool
    131 ) -> Iterator:
--> 132     for data_entry in data_it:
    133         try:
    134             yield self.map_transform(data_entry.copy(), is_train)

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/dataset/pandas.py:217, in PandasDataset.__iter__(self)
    216 def __iter__(self):
--> 217     yield from self._data_entries
    218     self.unchecked = True

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/dataset/pandas.py:188, in PandasDataset._pair_to_dataentry(self, item_id, df)
    179 if not self.unchecked:
    180     assert is_uniform(df.index), (
    181         \"Dataframe index is not uniformly spaced. \"
    182         \"If your dataframe contains data from multiple series in the \"
    183         'same column (\"long\" format), consider constructing the '
    184         \"dataset with `PandasDataset.from_long_dataframe` instead.\"
    185     )
    187 entry = {
--> 188     \"start\": df.index[0],
    189 }
    191 target = df[self.target].values
    192 target = target[: len(target) - self.future_length]

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/pandas/core/indexes/base.py:5385, in Index.__getitem__(self, key)
   5382 if is_integer(key) or is_float(key):
   5383     # GH#44051 exclude bool, which would return a 2d ndarray
   5384     key = com.cast_scalar_indexer(key)
-> 5385     return getitem(key)
   5387 if isinstance(key, slice):
   5388     # This case is separated from the conditional above to avoid
   5389     # pessimization com.is_bool_indexer and ndim checks.
   5390     return self._getitem_slice(key)

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/pandas/core/arrays/datetimelike.py:379, in DatetimeLikeArrayMixin.__getitem__(self, key)
    372 \"\"\"
    373 This getitem defers to the underlying array, which by-definition can
    374 only handle list-likes, slices, and integer scalars
    375 \"\"\"
    376 # Use cast as we know we will get back a DatetimeLikeArray or DTScalar,
    377 # but skip evaluating the Union at runtime for performance
    378 # (see https://github.com/pandas-dev/pandas/pull/44624)
--> 379 result = cast(\"Union[Self, DTScalarOrNaT]\", super().__getitem__(key))
    380 if lib.is_scalar(result):
    381     return result

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/pandas/core/arrays/_mixins.py:284, in NDArrayBackedExtensionArray.__getitem__(self, key)
    278 def __getitem__(
    279     self,
    280     key: PositionalIndexer2D,
    281 ) -> Self | Any:
    282     if lib.is_integer(key):
    283         # fast-path
--> 284         result = self._ndarray[key]
    285         if self.ndim == 1:
    286             return self._box_func(result)

IndexError: index 0 is out of bounds for axis 0 with size 0"

Does anyone have any ideas on how one item at a time can be inferenced instead of having to pass multiple items in a dataset at once? The shape of this subset is the exact same as the training shape along with dtypes. Thanks!

Originally posted by @Alex-Wenner-FHR in https://github.com/awslabs/gluonts/discussions/3126

Alex-Wenner-FHR commented 9 months ago

It appears, that when using the same dataset spec with my subset, the other categories are still represented for whatever reason.

for iter in ds_val._data_entries.iterable.iterable:
    print(iter)
[0 rows x 24 columns])
('cat2', Empty DataFrame
Columns: [...]
Index: []

[0 rows x 24 columns])
('cat3', Empty DataFrame
Columns: [...]
Index: []
Alex-Wenner-FHR commented 9 months ago

This is less than ideal, but doing something like this allows a single item_id to be inferenced:

iterable: tuple = ds_val._data_entries.iterable.iterable
iterable = [t for t in iterable if len(t[1]) > 1]
ds_val._data_entries.iterable.iterable = tuple(iterable)
Alex-Wenner-FHR commented 8 months ago

@lostella - has anyone from the team been able to lend an eye to this?

lostella commented 8 months ago

@Alex-Wenner-FHR predict gets a dataset just like train: if you want to only predict a specific item id, you should be able to construct a PandasDataset with only a subset of the data, and pass that to predict. Does that work?

Alex-Wenner-FHR commented 8 months ago

It does not - if you check out the issue a few comments above I put a work around that I was able to implement to get it to work, but natively it does not!