TorchSpatiotemporal / tsl

tsl: a PyTorch library for processing spatiotemporal data.
https://torch-spatiotemporal.readthedocs.io/
MIT License
255 stars 25 forks source link

Cannot retrieve samples from SpatioTemporalDataset when using scalers #44

Closed tboussaid closed 3 days ago

tboussaid commented 3 weeks ago

Hi,

I am using the library to train models on data that were already scaled (using sklearn MinMaxScaler). I wanted to use directly TSL Scalers to make it more flexible and neat. I create a SpatioTemporalDataset as follows:

from tsl.data import SpatioTemporalDataset
from tsl.data import WINDOW, HORIZON
from tsl.data.preprocessing import MinMaxScaler

scalers = {
    'target': MinMaxScaler(),
    'forecasts': MinMaxScaler(),
}

torch_dataset = SpatioTemporalDataset(
    target=dataset.dataframe(),
    connectivity=connectivity,
    covariates=dataset.covariates,
    input_map={
        'x': (['target'], WINDOW),
        'u': (['forecasts'], HORIZON),
    },
    target_map={
        'y': (['target'], HORIZON)
    },
    scalers=scalers,
    horizon=horizon,
    window=window,
    stride=stride
)

When printing the dataset through print(torch_dataset), I get:

SpatioTemporalDataset(n_samples=23377, n_nodes=15, n_channels=1)

However when I try to get the first sample via torch_dataset[0], I get the following error:

File [~/tsl/tsl/lib/python3.11/site-packages/tsl/data/spatiotemporal_dataset.py:763], in SpatioTemporalDataset._add_to_sample(self, out, synch_mode, endpoint, time_index, node_index)
    759     tensor, scaler = self.collate_item_elem(key,
    760                                             time_index=time_index,
    761                                             node_index=node_index)
    762 else:
--> 763     tensor, scaler = self.get_tensor(item.keys[0],
    764                                      preprocess=item.preprocess,
    765                                      time_index=time_index,
    766                                      node_index=node_index)
    767 if endpoint == 'auxiliary':
    768     out[key] = tensor

File [~/tsl/tsl/lib/python3.11/site-packages/tsl/data/spatiotemporal_dataset.py:665], in SpatioTemporalDataset.get_tensor(self, key, preprocess, time_index, node_index)
    663 scaler = None
    664 if key in self.scalers is not None:
--> 665     scaler = self.scalers[key].slice(time_index=time_index,
    666                                      node_index=node_index)
    667     if preprocess:  # transform tensor
    668         x = scaler.transform(x)

File [~/tsl/tsl/lib/python3.11/site-packages/tsl/data/preprocessing/scalers.py:526], in ScalerModule.slice(self, time_index, node_index)
    524     ti_scale = time_index if self.scale.size(t) > 1 else new_axes
    525 if self.n_axis is not None:
--> 526     ni_bias = node_index if self.bias.size(n) > 1 else None
    527     ni_scale = node_index if self.scale.size(n) > 1 else None
    529 # slice params

IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

I don't see what's the problem here. Could you suggest any solution?

The scalers argument is also available for a SpatioTemporalDataModule, is there any difference between applying the scalers in SpatioTemporalDataModule or in SpatioTemporalDataset?

Thank you in advance for your help.

Best,

marshka commented 2 weeks ago

Hi, thanks for using our library! The problem might be in the shapes of the scalers' parameters. When the scalers are fitted in the SpatioTemporalDataModule, the library sets them properly according to the data. In particular, the operation done in the datamodule is:

scaler = scaler.fit(data, mask=mask, keepdims=True)

with keepdims=True ensuring that the scaler's parameters and the data to be scaled have the same shape. Thus, if you want to set the scalers manually, you must ensure that the shapes match by adding dummy one-sized dimensions.