PacktPublishing / Deep-Reinforcement-Learning-Hands-On

Hands-on Deep Reinforcement Learning, published by Packt
MIT License
2.81k stars 1.28k forks source link

Chapter6 DQN Pong can't calculate loss #90

Closed pmsdOliveira closed 3 years ago

pmsdOliveira commented 3 years ago

I'm running the exact same code yet i get this error:

C:\Users\Utilizador\anaconda3\python.exe "C:/Users/Utilizador/Thesis/Deep Reinforcement Learning Hands-On/Chapter6/02_dqn_pong.py" DQN( (conv): Sequential( (0): Conv2d(4, 32, kernel_size=(8, 8), stride=(4, 4)) (1): ReLU() (2): Conv2d(32, 64, kernel_size=(4, 4), stride=(2, 2)) (3): ReLU() (4): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1)) (5): ReLU() ) (fc): Sequential( (0): Linear(in_features=3136, out_features=512, bias=True) (1): ReLU() (2): Linear(in_features=512, out_features=6, bias=True) ) ) 880: done 1 games, mean reward -21.000, eps 0.99, speed 720.31 f/s 1848: done 2 games, mean reward -21.000, eps 0.98, speed 659.82 f/s 2670: done 3 games, mean reward -21.000, eps 0.97, speed 642.90 f/s 3492: done 4 games, mean reward -21.000, eps 0.97, speed 666.83 f/s 4659: done 5 games, mean reward -20.600, eps 0.95, speed 671.71 f/s Best mean reward updated -21.000 -> -20.600, model saved 5765: done 6 games, mean reward -20.167, eps 0.94, speed 629.02 f/s Best mean reward updated -20.600 -> -20.167, model saved 6682: done 7 games, mean reward -20.143, eps 0.93, speed 645.23 f/s Best mean reward updated -20.167 -> -20.143, model saved 7668: done 8 games, mean reward -20.125, eps 0.92, speed 628.90 f/s Best mean reward updated -20.143 -> -20.125, model saved 8648: done 9 games, mean reward -20.000, eps 0.91, speed 606.56 f/s Best mean reward updated -20.125 -> -20.000, model saved 9919: done 10 games, mean reward -19.700, eps 0.90, speed 623.48 f/s Best mean reward updated -20.000 -> -19.700, model saved Traceback (most recent call last): File "C:/Users/Utilizador/Thesis/Deep Reinforcement Learning Hands-On/Chapter6/02_dqn_pong.py", line 169, in loss_t = calc_loss(batch, net, tgt_net, device=device) File "C:/Users/Utilizador/Thesis/Deep Reinforcement Learning Hands-On/Chapter6/02_dqn_pong.py", line 96, in calc_loss state_action_values = net(states_v).gather(1, actions_v.unsqueeze(-1)).squeeze(-1) RuntimeError: gather_out_cpu(): Expected dtype int64 for index

Process finished with exit code 1

I'm relatively new to Python and couldn't find any solution to this kind of problem on other sources. Anyone had this same problem or knows how to fix it?

pmsdOliveira commented 3 years ago

I just found that if you're on newer versions of PyTorch you should define your calc_loss like this:

`def calc_loss(batch, net, tgt_net, device="cpu"): states, actions, rewards, dones, next_states = batch

states_v = torch.tensor(states).to(device)
next_states_v = torch.tensor(next_states).to(device)
actions_v = torch.tensor(actions).to(device, dtype=torch.int64)
rewards_v = torch.tensor(rewards).to(device)
done_mask = torch.tensor(dones).to(device, dtype=torch.bool)

state_action_values = net(states_v).gather(1, actions_v.unsqueeze(-1)).squeeze(-1)
next_state_values = tgt_net(next_states_v).max(1)[0]
next_state_values[done_mask] = 0.0
next_state_values = next_state_values.detach()

expected_state_action_values = next_state_values * GAMMA + rewards_v
return nn.MSELoss()(state_action_values, expected_state_action_values)`

The only differences between this and the code in the book are the declarations of vars actions_v and done_mask