QData / spacetimeformer

Multivariate Time Series Forecasting with efficient Transformers. Code for the paper "Long-Range Transformers for Dynamic Spatiotemporal Forecasting."
https://arxiv.org/abs/2109.12218
MIT License
808 stars 191 forks source link

multiple-step ahead prediction for PeMS dataset #27

Closed rottenivy closed 2 years ago

rottenivy commented 2 years ago

Hi authors, thanks for sharing the code of this paper. I have a question when trying to implement these models. It seems like the multiple-step ahead prediction is treated in different ways for the PeMS dataset.

For example, in the LSTM model, the output of the decoder at each timestep is used as the decoder input to generate the prediction at the next step, meaning we don't use any ground truth data at the inference step. However, in the spacetimeformer model, we simply use a mask to mask out future positions but we still use ground truth data as the decoder input. Essentially, I think it's equivalent to a rolling 1-step prediction that is different from the multiple-step ahead prediction in the LSTM model.

jakegrigsby commented 2 years ago

Hi, It's true that the prediction loop is different for the Transformer models vs LSTM. LSTM generates each timestep of the output in sequence while the Transformers predict every timestep in one forward pass. We set this up by passing decoder inputs as zeros and turn off decoder masking so that the predictions can share information. The code makes that a little confusing because there are mask options that are disabled. We also pass the true dataset target values into the training step but replace them with zeros before they go into the decoder, so inference does not require ground truth data.

https://github.com/QData/spacetimeformer/blob/1ee8c8143a218b5d70a688c4929ba9bca7441914/spacetimeformer/spacetimeformer_model/spacetimeformer_model.py#L185-L193

Because all the tokens are zero there is no information leak from future timesteps and we can skip the mask. I'm working on some changes to make inference simpler by creating the zero sequence automatically.