Open penguinshin opened 6 years ago
The reason behind this is very much the same as in supervised learning: for the sake of efficient learning by backprop. An in-depth discussion can be found here: https://datascience.stackexchange.com/questions/20098/why-do-we-normalize-the-discounted-rewards-when-doing-policy-gradient-reinforcem
Thank you, hugely helpful. Are you still working on RL and trading?
Yes, that's my primary interest. Hope you find this repo helpful and good luck researching!
Thanks! By any chance do you know where to find limited level (maybe 5) frequently updated orderbook data for crypto?
you'll need to get the data from the exchange itself. ccxt is a good general way to do this or a more efficient way is to use the exchange's API (websocket is best) to get your data
On main.py, you standardize the Q-vector (I assume this is the discounted cumulative reward), via mean and std. Why do you do this? Aren't you normalizing out any notion of positive/negative returns?