Closed guotong1988 closed 5 years ago
The reward is provided through the weights
. The targets
are the actions. You can think of REINFORCE as a weighted log-likelihood training where the reward or advantage times probability is used as the weight and the action is used as the target.
thank you very much
In https://github.com/crazydonkey200/neural-symbolic-machines/blob/master/nsm/agent_factory.py
In https://github.com/crazydonkey200/neural-symbolic-machines/blob/master/nsm/model_factory.py
I guess
targets
should at least be related to rewards. Thank you!!