[Bug] v1.1.0a11 MultiInputPolicy DQN Runtime Error: device mismatch #491

Closed minhlong94 closed 3 years ago

minhlong94 commented 3 years ago

🐛 Bug

Runtime error: device (CUDA) mismatch when using MultiInputPolicy on version 1.1.0a11 with custom environment and features extractor, on DQN.

To Reproduce

Since I am not allowed to publish the full code yet, I will try my best to describe the steps to reproduce. These names are replaced with XXX (and I may miss some).

Here is the observation and action space:

self.action_space = gym.spaces.Discrete(7)
        obs_space = dict(
            XXX=gym.spaces.Box(low=0, high=float("inf"), shape=(6,), dtype=np.float32),
            XXX=gym.spaces.Box(low=0, high=float("inf"),
                                           shape=(7, ), dtype=np.float32),
            XXX=gym.spaces.Box(low=0, high=float("inf"), shape=(1,), dtype=np.float32),
            XXX=gym.spaces.Box(low=0.0, high=1.0, shape=(1,), dtype=np.float32),
            XXX=gym.spaces.Box(low=0, high=float("inf"), shape=(6,), dtype=np.float32),
self.observation_space = gym.spaces.Dict(obs_space)

And this is the Features Extractor:

class XXXFeatureExtractor(BaseFeaturesExtractor):
    def __init__(self, observation_space: gym.spaces.Dict, features_dim: int = 256):
        super(XXXFeatureExtractor, self).__init__(observation_space, features_dim)
        self.XXX = nn.Sequential(nn.Conv1d(1, 128, 4, 1),
                                            nn.ReLU(), nn.Flatten())
        self.XXX = nn.Sequential(nn.Conv1d(1, 128, 4, 1),
                                             nn.ReLU(), nn.Flatten())
        self.XXX = nn.Sequential(nn.Linear(1, 128), nn.ReLU())
        self.XXX = nn.Sequential(
            nn.Linear(1, 128), nn.ReLU())
        self.XXX = nn.Sequential(nn.Linear(7, 128),
                                               nn.ReLU(), nn.Flatten())
        self.XXX= nn.Sequential(nn.Conv1d(1, 128, 4, 1), nn.ReLU(),

    def forward(self, observations: torch.Tensor) -> torch.Tensor:
        XXX = self.XXX(observations["XXX"].unsqueeze(-2))
        XXX = self.XXX(observations["XXX"].unsqueeze(-2))
        XXX = self.XXX(observations["XXX"])
        XXX = self.XXX(observations["XXX"])
        XXX = self.XXX(observations["XXX"])
        XXX = self.XXX(observations["XXX"].unsqueeze(-2))
        cat = torch.cat((...), dim=1)
        self.last_layer = nn.Sequential(nn.Linear(cat.shape[1], self.features_dim*2), nn.Tanh(),
                                        nn.Linear(self.features_dim*2, self.features_dim), nn.Tanh())
        out = self.last_layer(cat)
        return out

And I call the training like usual:

model = DQN("MultiInputPolicy",...)

THE PROBLEM: if the device is set to cuda, it throws the following trace error, while if it is cpu it works perfectly fine.

Using cuda device
Wrapping the env in a DummyVecEnv.
Traceback (most recent call last):
  File "main.py", line 219, in <module>
  File "main.py", line 182, in main
    dqn_eval.test("results/testDQN", ...)
  File "E:\XXX\evaluators.py", line 189, in test
    evaluate_policy(self.model, env, n_eval_episodes=200)
  File "C:\Users\XXX\AppData\Local\Programs\Python\Python38\lib\site-packages\stable_baselines3\common\eva
luation.py", line 85, in evaluate_policy
    actions, states = model.predict(observations, state=states, deterministic=deterministic)
  File "C:\Users\XXX\AppData\Local\Programs\Python\Python38\lib\site-packages\stable_baselines3\dqn\dqn.py
", line 223, in predict
    action, state = self.policy.predict(observation, state, mask, deterministic)
  File "C:\Users\XXX\AppData\Local\Programs\Python\Python38\lib\site-packages\stable_baselines3\common\pol
icies.py", line 302, in predict
    actions = self._predict(observation, deterministic=deterministic)
  File "C:\Users\XXX\AppData\Local\Programs\Python\Python38\lib\site-packages\stable_baselines3\dqn\polici
es.py", line 175, in _predict
    return self.q_net._predict(obs, deterministic=deterministic)
  File "C:\Users\XXX\AppData\Local\Programs\Python\Python38\lib\site-packages\stable_baselines3\dqn\polici
es.py", line 69, in _predict
    q_values = self.forward(observation)
  File "C:\Users\XXX\AppData\Local\Programs\Python\Python38\lib\site-packages\stable_baselines3\dqn\polici
es.py", line 66, in forward
    return self.q_net(self.extract_features(obs))
  File "C:\Users\XXX\AppData\Local\Programs\Python\Python38\lib\site-packages\stable_baselines3\common\pol
icies.py", line 128, in extract_features
    return self.features_extractor(preprocessed_obs)
  File "C:\Users\XXX\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\module.py",
 line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "E:\XXX\feature_extractor.py", line 36, in forward
    out = self.last_layer(cat)
  File "C:\Users\XXX\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\module.py",
 line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\XXX\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\container.p
y", line 119, in forward
    input = module(input)
  File "C:\Users\XXX\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\module.py",
 line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\XXX\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\linear.py",
 line 94, in forward
    return F.linear(input, self.weight, self.bias)
  File "C:\Users\XXX\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\functional.py", lin
e 1753, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: Tensor for 'out' is on CPU, Tensor for argument 1 'self' is on CPU, but expected them to be o
n GPU (while checking arguments for addmm)

The error even happens on Kaggle kernel.

Miffyli commented 3 years ago

The problem is in creating a new layer inside forward function. This part:

self.last_layer = nn.Sequential(nn.Linear(cat.shape[1], self.features_dim*2), nn.Tanh(),
                                        nn.Linear(self.features_dim*2, self.features_dim), nn.Tanh())

You should create all layers in the __init__ function, otherwise these layers are not included in the training and you will be re-creating layers on every network call. If you really must do this, then you need to move these layers to model's device (see PyTorch docs).

minhlong94 commented 3 years ago

Oh I see, that is indeed a serious mistake. I will close the issue accordingly then.