Open ericleonardo opened 1 month ago
Hi Eric, I appreciate you pointing this out! You are right, after making some searches I learned that shuffling the data when splitting the train and test sets can lead to data leakage, especially in financial time series data where the order of observations is crucial. I will disable shuffling to ensure that the temporal order of the data is preserved. Thank you for your valuable feedback
Hi! Very interesting work! But I think you should disable shuffle when splitting data. Train_test_split shuffles data by default, you can inform shuffle=false to avoid future data context leakage into training. Financial time series should never be shuffled/randomized when split train/test. I see you got 75% classification accuracy maybe because leakage. Input shuffle=false and repeat to check. Thank you!
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html