PacktPublishing / Deep-Reinforcement-Learning-Hands-On

Hands-on Deep Reinforcement Learning, published by Packt
MIT License
2.83k stars 1.28k forks source link

dqn_play.py for Chapter07 #43

Closed dorjeduck closed 5 years ago

dorjeduck commented 5 years ago

Hi Maxim,

first of all, fantastic book, thank you so much for that.

I saw other posted about that before but I couldnt resolve my problems reading that issues. I am sure its an easy task but to dull here on my side it seems.

I am struggling to adopt 03_dqn_play.py from Chapter 06 to the examples of Chapter 07. I am able with some minor tweaks to save the best nets during training, but I fail trying to "play" these nets. My problems start with the different wrappers we use in Chapter 07, which result in env.reset() returning a LazyFrames object instead of an observation.

If somebody out there could manage to write a small script to run the trained nets of Chapter 07 I would highly appreciate if you can share. Of course also any pointer how I can get this done myself would be highly appreciated.

Thanks

martin

anothercodejunkie commented 5 years ago

I am having the same problem. Here's my code, which probably looks a lot like yours:

import gym import time import ptan import argparse import numpy as np import torch from lib import wrappers from lib import dqn_model

DEFAULT_ENV_NAME = "PongNoFrameskip-v4" FPS = 25

if name == "main": parser = argparse.ArgumentParser() parser.add_argument( "-m", "--model", required = True, help = "Model file to load" ) parser.add_argument( "-e", "--env", default = DEFAULT_ENV_NAME, help = "Environment name to use, default = " + DEFAULT_ENV_NAME ) parser.add_argument( "-r", "--record", help = "Directory to store video recording" ) args = parser.parse_args()

#env = wrappers.make_env( args.env )
env = ptan.common.wrappers.wrap_dqn( gym.make( args.env ) )

if args.record:
    env = gym.wrappers.Monitor( env, args.record )
net = dqn_model.DQN( env.observation_space.shape, env.action_space.n )
net.load_state_dict( torch.load( args.model ) )

state = env.reset()
total_reward = 0.0
while True:
    start_ts = time.time()
    env.render()
    state_v = torch.tensor( np.array( [ state ], copy = False ) )
    q_vals = net( state_v )
    _, act_v = torch.max( q_vals, dim = 1 )
    action = int( act_v.item() )

    state, reward, done, _ = env.step( action )
    total_reward += reward
    if done:
        break
    delta = 1/ FPS - ( time.time() - start_ts )
    if delta > 0:
        time.sleep( delta )
print( "Total reward: {:.2f}".format( total_reward ) )

And here's the error:

File "03_dqn_play.py", line 33, in state_v = torch.tensor( np.array( [ state ], copy = False ) ) TypeError: int() argument must be a string, a bytes-like object or a number, not 'LazyFrames'

anothercodejunkie commented 5 years ago

Upon further digging, I found the answer. The code from 03_dqn_play.py does not change. Copy that over unchanged. What you need to change is in the wrappers.py file.

In ImageToPyTorch.observation(), change np.moveaxis to np.swapaxes.

In make_env(), remove the call that creates a ScaledFloatFrame. The network already does the scaling. Just return env after the BufferWrapper.

Found here: https://github.com/Shmuma/ptan/issues/19#issuecomment-457928530

dorjeduck commented 5 years ago

Hey,

thank you so much anothercodejunkie for that replies. Unfortunately I am on the move right now and cant implement your suggestions and give you feedback but I will definitely do once things are setup and ready again.

Thanks again

Martin