Closed cloudhs7 closed 1 year ago
It is true that when dealing with time-series data you rarely shuffle your data in order for the temporal order to be learned by the model. However, this is not always the case when handling time-series in a sliding window approach. In this regime, you don't treat a single timestamp as a data point; rather, you treat w
consecutive timestamps as a single data point and use them for your prediction (in this case, forecasting of the next value and reconstruction of the measured value). For this reason, shuffling your data is okay, as you have essentially split a single time-series into several smaller ones. On one hand, this feels like inducing some data leakage into your training set (for example, two data points with 80% same timestamps could end up one in the training and one in the validation set), but on the other hand your model may be trained faster and sometimes more efficiently.
Hi. Thanks for your wonderful work!
I'm curious about the reason why 'shuffle = True' is default option in this implementation below, because the data is time-series data.
def create_data_loaders(train_dataset, batch_size, val_split=0.1, shuffle=True, test_dataset=None):
Is there any reason why shuffle the time-series data? (or even if shuffled data can get the time-oriented features in GAT?)