Closed chingchou888 closed 4 years ago
Hi @chingchou888,
These are both good questions.
The reward of agent is defined at line 115-117 (and 107) in train.py. The reward is essentially the net unrealized (meaning the stocks are still in portfolio and not cashed out yet) profit evaluated at each action step made by the agent. In other words, we want to train the agent so that it can make profits. I'll add explanations on agents' learning mechanism in README.md later when I have time.
If you feel the learning rate of DDPG model is not large enough, you can adjust the learning rate hyperparameters (line 144 and 145 in DDPG.py). Note that DDPG model may converge faster (or not converge at all) with larger learning rates. And for hyperparameter tuning, there are a lot of good tools you can use (e.g. https://ray.readthedocs.io/en/latest/tune.html).
Let me know if you have other questions; and please phrase these questions more solidly so that others may refer as well. Thanks!
Hi Albert,
I am trying to learn a DDPG model, and I refer the code from your GitHub repository, but I always get a large gap between the reward and the real price. Can you do me a favor? I would like to know why the learning process of DDPG is very slow and the reward balance is so small?
Thanks for your reply in advanced!
Best regards, CCC