Why standardize Q values?

ThirstyScholar / trading-bitcoin-with-reinforcement-learning

An (unofficial) implementation of the post "Trading Bitcoin with Reinforcement Learning".

https://launchpad.ai/blog/trading-bitcoin

MIT License

97 stars 25 forks source link

Why standardize Q values? #2

Open penguinshin opened 6 years ago

penguinshin commented 6 years ago

On main.py, you standardize the Q-vector (I assume this is the discounted cumulative reward), via mean and std. Why do you do this? Aren't you normalizing out any notion of positive/negative returns?

ThirstyScholar commented 6 years ago

The reason behind this is very much the same as in supervised learning: for the sake of efficient learning by backprop. An in-depth discussion can be found here: https://datascience.stackexchange.com/questions/20098/why-do-we-normalize-the-discounted-rewards-when-doing-policy-gradient-reinforcem

penguinshin commented 6 years ago

Thank you, hugely helpful. Are you still working on RL and trading?

ThirstyScholar commented 6 years ago

Yes, that's my primary interest. Hope you find this repo helpful and good luck researching!

penguinshin commented 6 years ago

Thanks! By any chance do you know where to find limited level (maybe 5) frequently updated orderbook data for crypto?

kitt-th commented 5 years ago

you'll need to get the data from the exchange itself. ccxt is a good general way to do this or a more efficient way is to use the exchange's API (websocket is best) to get your data