DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.97k stars 1.68k forks source link

[Bug] v1.1.0a11 MultiInputPolicy DQN Runtime Error: device mismatch #491

Closed minhlong94 closed 3 years ago

minhlong94 commented 3 years ago

Important Note: We do not do technical support, nor consulting and don't answer personal questions per email. Please post your question on the RL Discord, Reddit or Stack Overflow in that case.

If your issue is related to a custom gym environment, please use the custom gym env template.

🐛 Bug

Runtime error: device (CUDA) mismatch when using MultiInputPolicy on version 1.1.0a11 with custom environment and features extractor, on DQN.

To Reproduce

Since I am not allowed to publish the full code yet, I will try my best to describe the steps to reproduce. These names are replaced with XXX (and I may miss some).

Here is the observation and action space:

self.action_space = gym.spaces.Discrete(7)
        obs_space = dict(
            XXX=gym.spaces.Box(low=0, high=float("inf"), shape=(6,), dtype=np.float32),
            XXX=gym.spaces.Box(low=0, high=float("inf"),
                                           shape=(7, ), dtype=np.float32),
            XXX=gym.spaces.Box(low=0, high=float("inf"), shape=(1,), dtype=np.float32),
            XXX=gym.spaces.Box(low=0.0, high=1.0, shape=(1,), dtype=np.float32),
            XXX=gym.spaces.Discrete(7),
            XXX=gym.spaces.Box(low=0, high=float("inf"), shape=(6,), dtype=np.float32),
        )
self.observation_space = gym.spaces.Dict(obs_space)

And this is the Features Extractor:

class XXXFeatureExtractor(BaseFeaturesExtractor):
    def __init__(self, observation_space: gym.spaces.Dict, features_dim: int = 256):
        super(XXXFeatureExtractor, self).__init__(observation_space, features_dim)
        self.XXX = nn.Sequential(nn.Conv1d(1, 128, 4, 1),
                                            nn.ReLU(), nn.Flatten())
        self.XXX = nn.Sequential(nn.Conv1d(1, 128, 4, 1),
                                             nn.ReLU(), nn.Flatten())
        self.XXX = nn.Sequential(nn.Linear(1, 128), nn.ReLU())
        self.XXX = nn.Sequential(
            nn.Linear(1, 128), nn.ReLU())
        self.XXX = nn.Sequential(nn.Linear(7, 128),
                                               nn.ReLU(), nn.Flatten())
        self.XXX= nn.Sequential(nn.Conv1d(1, 128, 4, 1), nn.ReLU(),
                                        nn.Flatten())

    def forward(self, observations: torch.Tensor) -> torch.Tensor:
        XXX = self.XXX(observations["XXX"].unsqueeze(-2))
        XXX = self.XXX(observations["XXX"].unsqueeze(-2))
        XXX = self.XXX(observations["XXX"])
        XXX = self.XXX(observations["XXX"])
        XXX = self.XXX(observations["XXX"])
        XXX = self.XXX(observations["XXX"].unsqueeze(-2))
        cat = torch.cat((...), dim=1)
        self.last_layer = nn.Sequential(nn.Linear(cat.shape[1], self.features_dim*2), nn.Tanh(),
                                        nn.Linear(self.features_dim*2, self.features_dim), nn.Tanh())
        out = self.last_layer(cat)
        return out

And I call the training like usual:

model = DQN("MultiInputPolicy",...)
...

THE PROBLEM: if the device is set to cuda, it throws the following trace error, while if it is cpu it works perfectly fine.

***EVALUATING: DQN***
Using cuda device
Wrapping the env in a DummyVecEnv.
Traceback (most recent call last):
  File "main.py", line 219, in <module>
    main()
  File "main.py", line 182, in main
    dqn_eval.test("results/testDQN", ...)
  File "E:\XXX\evaluators.py", line 189, in test
    evaluate_policy(self.model, env, n_eval_episodes=200)
  File "C:\Users\XXX\AppData\Local\Programs\Python\Python38\lib\site-packages\stable_baselines3\common\eva
luation.py", line 85, in evaluate_policy
    actions, states = model.predict(observations, state=states, deterministic=deterministic)
  File "C:\Users\XXX\AppData\Local\Programs\Python\Python38\lib\site-packages\stable_baselines3\dqn\dqn.py
", line 223, in predict
    action, state = self.policy.predict(observation, state, mask, deterministic)
  File "C:\Users\XXX\AppData\Local\Programs\Python\Python38\lib\site-packages\stable_baselines3\common\pol
icies.py", line 302, in predict
    actions = self._predict(observation, deterministic=deterministic)
  File "C:\Users\XXX\AppData\Local\Programs\Python\Python38\lib\site-packages\stable_baselines3\dqn\polici
es.py", line 175, in _predict
    return self.q_net._predict(obs, deterministic=deterministic)
  File "C:\Users\XXX\AppData\Local\Programs\Python\Python38\lib\site-packages\stable_baselines3\dqn\polici
es.py", line 69, in _predict
    q_values = self.forward(observation)
  File "C:\Users\XXX\AppData\Local\Programs\Python\Python38\lib\site-packages\stable_baselines3\dqn\polici
es.py", line 66, in forward
    return self.q_net(self.extract_features(obs))
  File "C:\Users\XXX\AppData\Local\Programs\Python\Python38\lib\site-packages\stable_baselines3\common\pol
icies.py", line 128, in extract_features
    return self.features_extractor(preprocessed_obs)
  File "C:\Users\XXX\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\module.py",
 line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "E:\XXX\feature_extractor.py", line 36, in forward
    out = self.last_layer(cat)
  File "C:\Users\XXX\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\module.py",
 line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\XXX\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\container.p
y", line 119, in forward
    input = module(input)
  File "C:\Users\XXX\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\module.py",
 line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\XXX\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\linear.py",
 line 94, in forward
    return F.linear(input, self.weight, self.bias)
  File "C:\Users\XXX\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\functional.py", lin
e 1753, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: Tensor for 'out' is on CPU, Tensor for argument 1 'self' is on CPU, but expected them to be o
n GPU (while checking arguments for addmm)

The error even happens on Kaggle kernel.

 System Info

Describe the characteristic of your environment:

Checklist

Miffyli commented 3 years ago

The problem is in creating a new layer inside forward function. This part:

self.last_layer = nn.Sequential(nn.Linear(cat.shape[1], self.features_dim*2), nn.Tanh(),
                                        nn.Linear(self.features_dim*2, self.features_dim), nn.Tanh())

You should create all layers in the __init__ function, otherwise these layers are not included in the training and you will be re-creating layers on every network call. If you really must do this, then you need to move these layers to model's device (see PyTorch docs).

minhlong94 commented 3 years ago

Oh I see, that is indeed a serious mistake. I will close the issue accordingly then.