why reward is penalized in line 140/141 in modules.py?

Thartvigsen / StopAndHop

Code for Stop&Hop, a method for learning to classify irregularly-sampled time series early

17 stars 1 forks source link

why reward is penalized in line 140/141 in modules.py? #3

Open zonglunli7515 opened 4 days ago

zonglunli7515 commented 4 days ago

Hi,

in line 140 in modules.py, if log_pi_stop is negative, then a positive reward would contribute to the loss function. What is the logic behind it?

I am new to reinforcement learning and pytorch, apologies if this is quite obvious.

Thanks.

Zonglun

Thartvigsen commented 4 days ago

Minimizing the negative log probability maximizes the probability. So if the reward's positive, then the log probability of the chosen action will be minimized, increasing the probability of the actions that led to that reward!

zonglunli7515 commented 4 days ago

Minimizing the negative log probability maximizes the probability. So if the reward's positive, then the log probability of the chosen action will be minimized, increasing the probability of the actions that led to that reward!

Many thanks for your speedy response. Initially I thought reward is sth more rewarding, and therefore should help reduce the total loss. Minimizing the negative log makes perfect sense.