Open TianQi-777 opened 3 years ago
Hi, my confusion is why the loss of transition_reward_model is as follows (here):
diff = (pred_next_latent_mu - next_h.detach()) / pred_next_latent_sigma loss = torch.mean(0.5 * diff.pow(2) + torch.log(pred_next_latent_sigma))
Especially term torch.log(pred_next_latent_sigma), can you explain or provide some relevant references?
torch.log(pred_next_latent_sigma)
I guess this is a Maximum Likelihood Loss for Gaussian distribution. That is, $\log ( 1/sigma \exp(-(x-\mu)^2 / (2\sigma^2) )$
Hi, my confusion is why the loss of transition_reward_model is as follows (here):
Especially term
torch.log(pred_next_latent_sigma)
, can you explain or provide some relevant references?