Bahraleloom / Bitcoin-Price-Prediction-Historical-Analysis-and-Future-Trends

This repository delves into the analysis and prediction of Bitcoin prices using various data science techniques.
0 stars 0 forks source link

shuffled data in train_test_split #1

Open ericleonardo opened 1 month ago

ericleonardo commented 1 month ago

Hi! Very interesting work! But I think you should disable shuffle when splitting data. Train_test_split shuffles data by default, you can inform shuffle=false to avoid future data context leakage into training. Financial time series should never be shuffled/randomized when split train/test. I see you got 75% classification accuracy maybe because leakage. Input shuffle=false and repeat to check. Thank you!

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

Bahraleloom commented 1 month ago

Hi Eric, I appreciate you pointing this out! You are right, after making some searches I learned that shuffling the data when splitting the train and test sets can lead to data leakage, especially in financial time series data where the order of observations is crucial. I will disable shuffling to ensure that the temporal order of the data is preserved. Thank you for your valuable feedback