The reason why use shuffle in time-series data

ML4ITS / mtad-gat-pytorch

PyTorch implementation of MTAD-GAT (Multivariate Time-Series Anomaly Detection via Graph Attention Networks) by Zhao et. al (2020, https://arxiv.org/abs/2009.02040).

MIT License

328 stars 76 forks source link

It is true that when dealing with time-series data you rarely shuffle your data in order for the temporal order to be learned by the model. However, this is not always the case when handling time-series in a sliding window approach. In this regime, you don't treat a single timestamp as a data point; rather, you treat w consecutive timestamps as a single data point and use them for your prediction (in this case, forecasting of the next value and reconstruction of the measured value). For this reason, shuffling your data is okay, as you have essentially split a single time-series into several smaller ones. On one hand, this feels like inducing some data leakage into your training set (for example, two data points with 80% same timestamps could end up one in the training and one in the validation set), but on the other hand your model may be trained faster and sometimes more efficiently.

ML4ITS / mtad-gat-pytorch

The reason why use shuffle in time-series data #26