Loss function only contains instantaneous reward but not cumulated reward

ZhengyaoJiang / PGPortfolio

PGPortfolio: Policy Gradient Portfolio, the source code of "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem"(https://arxiv.org/pdf/1706.10059.pdf).

GNU General Public License v3.0

1.74k stars 750 forks source link

Loss function only contains instantaneous reward but not cumulated reward #109

Open AchillesJJ opened 6 years ago

AchillesJJ commented 6 years ago

As show in the nnagent.py, the author use average return of a batch as the loss function. However, it seems that such loss function only contains instantaneous reward, not average cumulated reward. To be specific, supposing we have a batch of experience as follows

mini_batch = $(s_t, a_t, rt, ..., s(t+T), a(t+T), r(t+T))$

ZhengyaoJiang commented 6 years ago

However, it seems that such loss function only contains instantaneous reward, not average cumulated reward.

If there is no commission fee, when the action won't affect the state transition, optimizing the immediate rewards is equivalent to optimizing the long-term value. And this point, together with the differentiable reward function, gives superior sample efficiency compared with common purpose RL. To deal with the commission fee, we treat it as a regularization term.