Denys88 / rl_games

RL implementations
MIT License
863 stars 146 forks source link

How does the a2c network rescale the range of the action to [-1,1] #285

Closed des-zhong closed 4 months ago

des-zhong commented 4 months ago

I've searched the repo but still clueless. Is it a tanh function at the end of actor network? thank you!

denysm88 commented 4 months ago

I think we are using clamping by default. If you look inside of the yaml configs you can find: mu_activation: None Feel free to change it to the tanh. But during the training we still add noise with std and do clamp after.

des-zhong commented 4 months ago

thank you for your answer! it help me a lot. I still have a question tho I'm using isaacgym to train a robot. And after training, the best checkpoint is saved as a pth file. How can i load the pth file and deploy it to a real robot? I've searched for a while but still clueless Thank you

denysm88 commented 4 months ago

you can export it to the onnx. I have soem examples but without IsaacGym, probably the best way is try to do the same but with IG.