I don't understand a detail about the transition_reward_model loss function

facebookresearch / deep_bisim4control

Learning Invariant Representations for Reinforcement Learning without Reconstruction

Other

145 stars 37 forks source link

Open TianQi-777 opened 3 years ago

TianQi-777 commented 3 years ago

Hi, my confusion is why the loss of transition_reward_model is as follows (here):

diff = (pred_next_latent_mu - next_h.detach()) / pred_next_latent_sigma
loss = torch.mean(0.5 * diff.pow(2) + torch.log(pred_next_latent_sigma))

Especially term torch.log(pred_next_latent_sigma), can you explain or provide some relevant references?

Yufei-Kuang commented 3 years ago

I guess this is a Maximum Likelihood Loss for Gaussian distribution. That is, $\log ( 1/sigma \exp(-(x-\mu)^2 / (2\sigma^2) )$