How can one train pixel based algorithms by using the CnnPolicy? [question]

FabianSchuetze commented 4 years ago

First: This is a wonderful and very instructive repo - thank you very much for creating it!

I would like to train a pixel-based policy for the LunarLander environment. How could this be done? I tried to specify the hyperparameters for the ppo algorithm in the file hyperparameters/ppo.yml as follows:

LunarLander-v2:
  env_wrapper:
    - gym.wrappers.resize_observation.ResizeObservation:
        shape: 64
  frame_stack: 4
  n_envs: 1
  n_timesteps: !!float 1e6
  policy: 'CnnPolicy'
  n_steps: 1024
  batch_size: 64
  gae_lambda: 0.98
  gamma: 0.999
  n_epochs: 4
  ent_coef: 0.01

When trying to start training the model with python train.py --algo ppo --env LunarLander-v2 I receive an assertion error:

AssertionError: You should use NatureCNN only with images not with Box(64, 256) (you are probably using `CnnPolicy` instead of `MlpPolicy`)

Can somebody kindly illustrate how to use a CNN as feature extractor in this case? I queried python train --help but didn't found any indication how I can render the environment and use the resulting images as state.

araffin commented 4 years ago

Hello,

You will need to change the Lunar Lander environment. I think you should take a look at: https://github.com/hill-a/stable-baselines/issues/915 You can also look at our tutorial on custom gym environements (cf doc)

FabianSchuetze commented 4 years ago

Great - thank you very much for your kind and informative reply!

DLR-RM / rl-baselines3-zoo

How can one train pixel based algorithms by using the CnnPolicy? [question] #39