PacktPublishing / Deep-Reinforcement-Learning-Hands-On

Hands-on Deep Reinforcement Learning, published by Packt
MIT License
2.83k stars 1.28k forks source link

Chapter 6 Problem/Bug? On 02_dqn_pong.py #36

Closed icompute386 closed 5 years ago

icompute386 commented 5 years ago

(python36) c:\Anaconda\Deep-Reinforcement-Learning-Hands-On-master\Chapter06>python 02_dqn_pong.py DQN( (conv): Sequential( (0): Conv2d(4, 32, kernel_size=(8, 8), stride=(4, 4)) (1): ReLU() (2): Conv2d(32, 64, kernel_size=(4, 4), stride=(2, 2)) (3): ReLU() (4): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1)) (5): ReLU() ) (fc): Sequential( (0): Linear(in_features=3136, out_features=512, bias=True) (1): ReLU() (2): Linear(in_features=512, out_features=6, bias=True) ) ) 762: done 1 games, mean reward -21.000, eps 0.99, speed 1040.47 f/s 1630: done 2 games, mean reward -20.500, eps 0.98, speed 993.68 f/s Best mean reward updated -21.000 -> -20.500, model saved 2622: done 3 games, mean reward -20.000, eps 0.97, speed 949.88 f/s Best mean reward updated -20.500 -> -20.000, model saved 3458: done 4 games, mean reward -20.000, eps 0.97, speed 928.02 f/s 4257: done 5 games, mean reward -20.200, eps 0.96, speed 904.58 f/s 5019: done 6 games, mean reward -20.333, eps 0.95, speed 908.89 f/s 5938: done 7 games, mean reward -20.286, eps 0.94, speed 914.17 f/s 6700: done 8 games, mean reward -20.375, eps 0.93, speed 937.75 f/s 7612: done 9 games, mean reward -20.444, eps 0.92, speed 884.04 f/s 8374: done 10 games, mean reward -20.500, eps 0.92, speed 866.52 f/s 9624: done 11 games, mean reward -20.273, eps 0.90, speed 865.36 f/s Traceback (most recent call last): File "02_dqn_pong.py", line 170, in loss_t = calc_loss(batch, net, tgt_net, device=device) File "02_dqn_pong.py", line 97, in calc_loss state_action_values = net(states_v).gather(1, actions_v.unsqueeze(-1)).squeeze(-1) RuntimeError: Expected object of scalar type Long but got scalar type Int for argument #3 'index'

icompute386 commented 5 years ago

Found documentation on Gather's Arguments at: https://pytorch.org/docs/stable/torch.html

Looks like it does require a LongTensor, so modifying the code to this let the script run without the error.

I'm still wondering about the new_state=new_state. I changed this to be self.state=new_state, though I'm concerned about performance. Running a Titan 2080, seeing 53fps, where as it looks like you were reporting around 150fps on a Titan 1080

def calc_loss(batch, net, tgt_net, device="cpu"): states, actions, rewards, dones, next_states = batch

states_v = torch.tensor(states).to(device)
next_states_v = torch.tensor(next_states).to(device)
actions_v = torch.tensor(actions).to(device)
rewards_v = torch.tensor(rewards).to(device)
done_mask = torch.ByteTensor(dones).to(device)

#state_action_values = net(states_v).gather(1, actions_v.unsqueeze(-1)).squeeze(-1)
#a  = net(states_v)
#b1 = actions_v.unsqueeze(-1).long()
#b2 = a.gather(1, b1)
#state_action_values  = b2.squeeze(-1)

b1 = actions_v.unsqueeze(-1).long()
state_action_values = net(states_v).gather(1, b1).squeeze(-1)

next_state_values = tgt_net(next_states_v).max(1)[0]
next_state_values[done_mask] = 0.0
next_state_values = next_state_values.detach()

expected_state_action_values = next_state_values * GAMMA + rewards_v
return nn.MSELoss()(state_action_values, expected_state_action_values)
Shmuma commented 5 years ago

I'm not 100% sure, but looks like you're using PyTorch > 0.4 (which is the latest version supported by examples). Work on 1.0 porting is in progress.

icompute386 commented 5 years ago

Interestingly, all of your other examples I've run so far for chapters 7 & 8 didn't have this issue.

dalerobichaudpi commented 3 years ago

This worked for me in PyTorch 1.9 when I had initial index errors. Thanks a lot for posting

state_action_values = net(states_v).gather( 1, actions_v.unsqueeze(-1).long()).squeeze(-1)

adding long() works, and now trained up pong, breakout, and played like a charm