I found in focops.py :
ratio = torch.exp(log_prob - log_prob_b)
temp_kl = torch.distributions.kl_divergence(
distribution, old_distribution_b
).sum(-1, keepdim=True)
loss_pi = (temp_kl - (1 / FOCOPS_LAM) ratio adv_b) * (
temp_kl.detach() <= dict_args['target_kl']
).type(torch.float32)
loss_pi = loss_pi.mean()
Assuming minibatch size=64, temp_kl.shape=(64,1) due to keepdim=True used in calculating temp_kl, but other variables in loss_pi = (64,) which makes loss_pi.shape=(64,64) instead of (64,1) or (64,). So, Why not keep the same dimensions?
I found in focops.py : ratio = torch.exp(log_prob - log_prob_b) temp_kl = torch.distributions.kl_divergence( distribution, old_distribution_b ).sum(-1, keepdim=True) loss_pi = (temp_kl - (1 / FOCOPS_LAM) ratio adv_b) * ( temp_kl.detach() <= dict_args['target_kl'] ).type(torch.float32) loss_pi = loss_pi.mean() Assuming minibatch size=64, temp_kl.shape=(64,1) due to keepdim=True used in calculating temp_kl, but other variables in loss_pi = (64,) which makes loss_pi.shape=(64,64) instead of (64,1) or (64,). So, Why not keep the same dimensions?