araffin / rl-baselines-zoo

A collection of 100+ pre-trained RL agents using Stable Baselines, training and hyperparameter optimization included.
https://stable-baselines.readthedocs.io/
MIT License
1.12k stars 206 forks source link

[question] Why is the environment instantiated differently for DDPG and DQN? #76

Closed PierreExeter closed 4 years ago

PierreExeter commented 4 years ago

Hello,

Thanks a lot for this amazing code.

I noticed that the environment is instantiated differently when using either DQN or DDPG. Specifically at line 249 of train.py, the env is created with:

env = gym.make(env_id, **env_kwargs)
env.seed(args.seed)

in the case of DQN and DDPG whereas it is created with the make_env helper:

env = DummyVecEnv([make_env(env_id, 0, args.seed, wrapper_class=env_wrapper, log_dir=log_dir, env_kwargs=env_kwargs)])

for all the other algorithms.

This means that the environment is not vectorized, it is not possible to specify the log directory and to monitor the training. Why did you make a special case for DQN and DDPG?

Thanks

araffin commented 4 years ago

Hello,

This means that the environment is not vectorized, it is not possible to specify the log directory and to monitor the training. Why did you make a special case for DQN and DDPG?

Good point. There was some issue at some point with DQN/DDPG and vectorized environment (the performances where different,i I did that as a safe option) so I need to check if it is still the case, it may be an issue with copying arrays.

PierreExeter commented 4 years ago

Ok I will run a short experiment with and without vectorized environment and let you know if I see any difference.

PierreExeter commented 4 years ago

FYI,

Vectorizing DQN or DDPG does not seem to affect the training, the average return or the training time significantly, see results attached.

comparison_vec_noVec.pdf

araffin commented 4 years ago

Thanks, then I may change that =)

Btw, what did you use to generate that nice pdf?

PierreExeter commented 4 years ago

You're welcome! I used Latex and Matplotlib for the figures.

PierreExeter commented 4 years ago

Btw I'm not sure whether this will affect the results but I received this warning during training:

UserWarning: Training and eval env are not of the same type<Monitor<TimeLimit<PendulumEnv>>> != <stable_baselines.common.vec_env.dummy_vec_env.DummyVecEnv object at 0x7f817ac2a490> "{} != {}".format(self.training_env, self.eval_env))

araffin commented 4 years ago

You're welcome! I used Latex and Matplotlib for the figures.

I meant: do you have a script somewhere or you wrote the latex file manually?

Btw I'm not sure whether this will affect the results but I received this warning during training:

This is a normal warning when uing eval env, it is hard to check that the training env is the same as the eval env, so instead of throwing an error, we warn the user.

PierreExeter commented 4 years ago

I meant: do you have a script somewhere or you wrote the latex file manually?

It's a template I use for writing report. It's pretty basic but here it is if that can be of any use.

DQN-DDPG_vec_noVec.zip