Running multiple environments

neale commented 4 years ago

Hi, I've used this repo for continuous control experiments and found it useful in general. For running experiments on pixel-based environments. Running a single env seems really slow and really doesn't use much of the CPU. I changed the function signature in class Task in envs.py to num_envs=32 and single_process=False to take advantage of vectorized environments but I'm not sure if that's the intended usage.

ShangtongZhang commented 4 years ago

My suggestion is never set single_process to false. You can have num_envs=32 and single_process=True. Then the code will use one process to simulate 32 environments, which I believe is usually faster than using 32 processes. Do you have a GPU to do pixel-based environments?

neale commented 4 years ago

I have a GPU and have set select_device accordingly.
I set single_process=False because I don't get an error. If I pass num_envs larger than 1 to Task() then I get an error with the DQN Actor. Here's the full stack trace when I run dqn_pixel on breakout with config.task_fn = lambda: Task(config.game, num_envs=8)

2019-11-18 11:10:39,051 - root - INFO: steps 0, 349525333.33 steps/s
Process DQNActor-2:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/ubuntu/repos/DeepRL/deep_rl/agent/BaseAgent.py", line 150, in run
    cache.append(self._sample())
  File "/home/ubuntu/repos/DeepRL/deep_rl/agent/BaseAgent.py", line 137, in _sample
    transitions.append(self._transition())
  File "/home/ubuntu/repos/DeepRL/deep_rl/agent/DQN_agent.py", line 32, in _transition
    next_state, reward, done, info = self._task.step([action])
  File "/home/ubuntu/repos/DeepRL/deep_rl/component/envs.py", line 187, in step
    return self.env.step(actions)
  File "/home/ubuntu/repos/baselines/baselines/common/vec_env/vec_env.py", line 108, in step
    return self.step_wait()
  File "/home/ubuntu/repos/DeepRL/deep_rl/component/envs.py", line 139, in step_wait
    obs, rew, done, info = self.envs[i].step(self.actions[i])
  File "/home/ubuntu/repos/baselines/baselines/common/atari_wrappers.py", line 211, in step
    ob, reward, done, info = self.env.step(action)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/gym/core.py", line 261, in step
    observation, reward, done, info = self.env.step(action)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/gym/core.py", line 261, in step
    observation, reward, done, info = self.env.step(action)
  File "/home/ubuntu/repos/baselines/baselines/common/atari_wrappers.py", line 59, in step
    return self.env.step(ac)
  File "/home/ubuntu/repos/baselines/baselines/common/atari_wrappers.py", line 71, in step
    obs, reward, done, info = self.env.step(action)
  File "/home/ubuntu/repos/DeepRL/deep_rl/component/envs.py", line 64, in step
    obs, reward, done, info = self.env.step(action)
  File "/home/ubuntu/repos/baselines/baselines/common/atari_wrappers.py", line 110, in step
    obs, reward, done, info = self.env.step(action)
  File "/home/ubuntu/repos/baselines/baselines/common/atari_wrappers.py", line 39, in step
    return self.env.step(ac)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/gym/wrappers/time_limit.py", line 16, in step
    observation, reward, done, info = self.env.step(action)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/gym/envs/atari/atari_env.py", line 113, in step
    action = self._action_set[a]
IndexError: index 10 is out of bounds for axis 0 with size 4

The index it fails on is variable from run to run.

ShangtongZhang commented 4 years ago

DQN doesn't support multiple environments -- that are mainly designed for N-step DQN, A2C, PPO

neale commented 4 years ago

I see, thanks for the help.

ShangtongZhang / DeepRL

Running multiple environments #66