in trpo,Is fixed_log_probs the same as log_probs? In the program debugging process, the output of the two is the same, there is no difference between pnew and pold?
with torch.no_grad():
fixed_log_probs = policy_net.get_log_prob(states, actions)
"""define the loss function for TRPO"""
def get_loss(volatile=False):
with torch.set_grad_enabled(not volatile):
log_probs = policy_net.get_log_prob(states, actions)
action_loss = -advantages * torch.exp(log_probs - fixed_log_probs)
in trpo,Is fixed_log_probs the same as log_probs? In the program debugging process, the output of the two is the same, there is no difference between pnew and pold?