Khrylx / PyTorch-RL

PyTorch implementation of Deep Reinforcement Learning: Policy Gradient methods (TRPO, PPO, A2C) and Generative Adversarial Imitation Learning (GAIL). Fast Fisher vector product TRPO.
MIT License
1.09k stars 186 forks source link

CNN Policy #8

Open bbalaji-ucsd opened 5 years ago

bbalaji-ucsd commented 5 years ago

Can you please add an example of a CNN policy? All the code is oriented towards MLP policies.

yanglixiaoshen commented 5 years ago

I found that the observation dimension of Hopper-v1/v2 is 11, so it's not an image as a input for network. I think the MLP policy is the suitable method in fixing such gym game. Surely, if you work on another different domain, you can have a try for CNN policy.