TRPO,Is fixed_log_probs the same as log_probs

Khrylx / PyTorch-RL

PyTorch implementation of Deep Reinforcement Learning: Policy Gradient methods (TRPO, PPO, A2C) and Generative Adversarial Imitation Learning (GAIL). Fast Fisher vector product TRPO.

MIT License

1.1k stars 188 forks source link

TRPO,Is fixed_log_probs the same as log_probs #35

Closed yongpan0715 closed 2 years ago

yongpan0715 commented 2 years ago

in trpo,Is fixed_log_probs the same as log_probs? In the program debugging process, the output of the two is the same, there is no difference between pnew and pold?

 with torch.no_grad():
        fixed_log_probs = policy_net.get_log_prob(states, actions)
    """define the loss function for TRPO"""
    def get_loss(volatile=False):
        with torch.set_grad_enabled(not volatile):
            log_probs = policy_net.get_log_prob(states, actions)
            action_loss = -advantages * torch.exp(log_probs - fixed_log_probs)

Khrylx commented 2 years ago

Please refer to https://github.com/Khrylx/PyTorch-RL/issues/21 and https://github.com/Khrylx/PyTorch-RL/issues/11