cderinbogaz / inpredo

Inpredo is a Deep Learning tool which looks into financial charts and predicts stock movements.
https://towardsdatascience.com/making-a-i-that-looks-into-trade-charts-62e7d51edcba
MIT License
159 stars 93 forks source link

Very low accuracy? #17

Closed sword134 closed 3 years ago

sword134 commented 3 years ago

Anyone else here struggling with a very low accuracy? I cant seem to get it above 51% so its about as good as a cointoss. I have an equal amount of buy and sell in my training set as well as validation set.

cderinbogaz commented 3 years ago

How big is your training dataset? How many samples are there in the each of the folders?

sword134 commented 3 years ago

Yes there is over 800 images in buy and sell each. They have the exact same amount. Same for validation set, here there are only 84 images in each buy and sell folder.

cderinbogaz commented 3 years ago

Can you increase the number of samples and try again? Double the amount of samples if possible.

sword134 commented 3 years ago

How can I? I am using SPY OHLC daily data from 2005. I can turn it down to hourly of course.

sword134 commented 3 years ago

I couldnt get hourly data that far back so instead I settled on daily SPY data from 2000 until today. I've got almost 1000 images in train buy and 1000 in train sell. My validation set has 200 images in each category, my accuracy, however, is still abysmal.

munkh-erdene commented 3 years ago

You can download historical data from here.

https://www.investing.com/etfs/spdr-s-p-500-historical-data

cderinbogaz commented 3 years ago

@sword134 there is not guarantee that the AI will find patterns in the supplied data. In the past I have used historical data on BTC-USD. My hypothesis is; since btc-usd market is less complex compared SPY there were more patterns in it. You can give it a try with different markets such as gold-usd market. It has more similarities to BTC-USD that SPY.

cderinbogaz commented 3 years ago

@munkh-erdene did you tried it out with the SPY hourly data? If so can you tell us what was the result?

munkh-erdene commented 3 years ago

You can download from here ...

https://www.dukascopy.com/trading-tools/widgets/quotes/historical_data_feed

sword134 commented 3 years ago

@cderinbogaz I got more datapoints from running daily data on SPY since 2000 than hourly data on SPY since 2019 (730 days). So I went with the daily data from 2000 and got what I wrote earlier in terms of amount of images and results.

sword134 commented 3 years ago

@munkh-erdene I am using yfinance and would like to stick to that. I don't think the problem is the amount of data, because I have plenty.

cderinbogaz commented 3 years ago

@sword134 did you give it a try with gold-usd or other markets as well? As I said, this model was not tested on SPY but on BTC-USD. Al might not be able to find correlations in spy market.

sword134 commented 3 years ago

@cderinbogaz testet with bitcoin hourly data. I got 6828 training samples and 1600 validation samples. Accuracy still hovering around 50% :/

cderinbogaz commented 3 years ago

@sword134 how many epochs are you training?

sword134 commented 3 years ago

@cderinbogaz 250 for starters. But there is no improvement in the accuracy so there is no reason to train for longer

hxapartners commented 3 years ago

The data in the code is being classified by looking backwards. Its producing these buy and sell images based on what has transpired already. For ex for eurusd, lets assume. 10:00am price is 1.4 10:00 pm price is 1.41 10:00 am (next day) price is 1.39 The code currently makes classification by looking backward. The 0 pm is classified as BUY where as the correct image would be sell because 10 pm , we need to look forward to see what happened in future rather at 10 am next day when the price was 1.39 hence less than 1.41 at 10:00.

If you correctly do the classification, the accuracy drops dramatically to 50%.

sword134 commented 3 years ago

@hxapartners aha, I knew that I wasnt doing it wrong. I was using my own dataset after all that DOESNT look backwards. This repo is simply "wrong"

cderinbogaz commented 3 years ago

I have wrote this many times in the past and I am writing it again. I trained this on BTCUSD data and for that you need to flip data because it is ordered from current time to future. On forex data its the opposite and therefore you don't need to flip the data. If its 50% that means it didnt find any valuable correlation. Unlike what @sword134 is claiming by calling the repo "wrong". Its an image classifier works by generating financial chart data, that's it.