Closed 0Addicted0 closed 2 months ago
not sure for the second question but probably yes
Thank you for your timely response.
I think I misunderstood the meaning of line 246
stable-baselines3-contrib/sb3_contrib/trpo/trpo.py
For the second question, just providing action masks as you said, at least it runs well in my custom environment
🙂Thank you very much
❓ Question
🙏Thanks for the high scalability made by sb3-contrib
I am referring to the MaskablePPO method to add a mask to TRPO. And In the train function of I have found the following code:
❓Does it look like old_distribution and distribution are exactly the same(kl_div here eqs 0), or did I misread something?
By the way, may I also ask if adding action_masks forTRPO requires providing the corresponding masks before calculating the distribution used for kl_div?
🙂Thanks a lot
Checklist