Open zonglunli7515 opened 4 days ago
Minimizing the negative log probability maximizes the probability. So if the reward's positive, then the log probability of the chosen action will be minimized, increasing the probability of the actions that led to that reward!
Minimizing the negative log probability maximizes the probability. So if the reward's positive, then the log probability of the chosen action will be minimized, increasing the probability of the actions that led to that reward!
Many thanks for your speedy response. Initially I thought reward is sth more rewarding, and therefore should help reduce the total loss. Minimizing the negative log makes perfect sense.
Hi,
in line 140 in modules.py, if log_pi_stop is negative, then a positive reward would contribute to the loss function. What is the logic behind it?
I am new to reinforcement learning and pytorch, apologies if this is quite obvious.
Thanks.
Zonglun