Rachnog / Deep-Trading

Algorithmic trading with deep learning experiments
1.43k stars 695 forks source link

Wrong data preprocessing leading to better results | Multimodal project #18

Open Dudeldu opened 5 years ago

Dudeldu commented 5 years ago

Hi Alex, I really like your tutorials and used them as a good example for starting own projects ;) but I think there is a major error in the preprocessing, performed by the split_into_XY - function, in the process_data modul in the multimodal project.

x_i = data_chng_train[i:i+window]
y_i = np.std(data_chng_train[i:i+window+forecast][3])

By using the above mentioned code, for generating the regression labels, the train data contain the labels!!! In general, the idea behind it, isn't clear to me. First, the code should be replaced with (that's for sure):

x_i = data_chng_train[i:i+window]
y_i = np.std(data_chng_train[i+window+forecast])

But on the other hand, i dont understand, why you are using the standard deviation along the specific axis?! Shouldn't it be:

x_i = data_chng_train[i:i+window]
y_i = data_chng_train[i+window+forecast][3]  #Using the close prize [3] as label

Then obviously all results substantially change and getting worse:
figure_1

Rachnog commented 5 years ago

Hi @Dudeldu , you're totally right, definitely my bad. Will fix it, thanks!