facebookresearch / hanabi_SAD

Simplified Action Decoder for Deep Multi-Agent Reinforcement Learning
Other
96 stars 35 forks source link

How to run pre-trained IQL? #32

Closed ravihammond closed 2 years ago

ravihammond commented 2 years ago

Hi, thanks for this great repo!

When I run the download.sh script, it downloads the sad_models directory, which contains many iql_Xp_XX.pthw files. I would like to evaluate these pre-trained IQL models, but your existing scripts don't support an option for IQL.

An attempt I had was to run the evaluate_saved_model function, however, it's expecting a train.log file.

Thanks in advance for the advice!

hengyuan-hu commented 2 years ago

Can you try using the "--paper sad" flag?

ravihammond commented 2 years ago

When I run the command:

python tools/eval_model.py --weight ../models/sad_models/iql_2p_1.pthw --num_player 2 --paper sad

I get the following output

warning: pred.weight not loaded
warning: pred.bias not loaded
warning: pred.weight not loaded
warning: pred.bias not loaded
terminate called after throwing an instance of 'std::runtime_error'
  what():  The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
  File "/app/pyhanabi/r2d2.py", line 271, in act
        }

        greedy_action, new_hid = self.greedy_act(priv_s, legal_move, hid)
                                 ~~~~~~~~~~~~~~~ <--- HERE

        random_action = legal_move.multinomial(1).squeeze(1)
  File "/app/pyhanabi/r2d2.py", line 241, in greedy_act
        hid: Dict[str, torch.Tensor],
    ) -> Tuple[torch.Tensor, Dict[str, torch.Tensor]]:
        adv, new_hid = self.online_net.act(priv_s, hid)
                       ~~~~~~~~~~~~~~~~~~~ <--- HERE
        legal_adv = (1 + adv - adv.min()) * legal_move
        greedy_action = legal_adv.argmax(1).detach()
  File "/app/pyhanabi/r2d2.py", line 72, in act

        priv_s = priv_s.unsqueeze(0)
        x = self.net(priv_s)
            ~~~~~~~~ <--- HERE
        o, (h, c) = self.lstm(x, (hid["h0"], hid["c0"]))
        if self.skip_connect:
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/container.py", line 141, in forward
    def forward(self, input):
        for module in self:
            input = module(input)
                    ~~~~~~ <--- HERE
        return input
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 103, in forward
    def forward(self, input: Tensor) -> Tensor:
        return F.linear(input, self.weight, self.bias)
               ~~~~~~~~ <--- HERE
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py", line 1848, in linear
    if has_torch_function_variadic(input, weight, bias):
        return handle_torch_function(linear, (input, weight, bias), input, weight, bias=bias)
    return torch._C._nn.linear(input, weight, bias)
           ~~~~~~~~~~~~~~~~~~~ <--- HERE
RuntimeError: mat1 and mat2 shapes cannot be multiplied (364x838 and 783x512)

Aborted (core dumped)
hengyuan-hu commented 2 years ago

Ah, sorry. You need to turn off the SAD field (i.e. extra greedy action) in the input encoding.

I.e. set this value to False when evaluating an agent trained without SAD encoding scheme. https://github.com/facebookresearch/hanabi_SAD/blob/415804b531447bb4b8adb12100f994d588589cd8/pyhanabi/tools/eval_model.py#L35

ravihammond commented 2 years ago

Great that works, thanks for that!