Shmuma / ptan

PyTorch Agent Net: reinforcement learning toolkit for pytorch
MIT License
531 stars 165 forks source link

Hi, how do I get numpy array froma LazyFrame to simply play the Trained nets ? #19

Open JulesVerny opened 5 years ago

JulesVerny commented 5 years ago

I have into Chapter 7 of your book. Its really impressive, however many details are buried within this PTAN package.
I believe I have trained a number of nets against atari games in Chapt 7, but to replay them is causing me some frustration. I tried to modify the play game code from Chapter 6. But now state = env.reset() / step returns a ptan.common.wraper.LazyFrames object. It is not obvious how to convert this back into a simple numpy array to select a Single Best Action, for playing a trained game. state_v = torch.tensor(np.array([state], copy=False)) returns a Typerror as numpy does not understand your LazyFrames object type. It is not obvious to simply convert a single obs(LazyFrame) into a numpy object, and hence into a torch tensor to feed into DQN network. Hoping for some help, to continue

Shmuma commented 5 years ago

Hi!

LazyFrames is not my invention, it was copied from openai standard wrappers. This class is supposed to avoid keeping the copy of the same frame multiple times, for instance in situation in frame stack.

This class exposes __array__ method, which is numpy interface to convert anything into array-like. So, to convert LazyFrames into ndarray, you just need to call np.array(lazy_frames_instance). Your example, I guess, returns typeerror as you're passing state in the list.

Hope this will help.

JulesVerny commented 5 years ago

Hello Thanks for the quick reply. Yep as you stated it was a problem passing in in as a list [state] I have now tried the following which gets me a little further: state_v = torch.tensor(np.array (state,copy=False)) # Seems does return numpy q_vals = net(state_v).data.numpy()[0]

I am now stuck on hitting a RuntimeError: Expected 4-dimensional input for 4-dimensional weight [32, 4, 8, 8], but got 3-dimensional input of size [4, 84, 84] instead. Even though I thought the Chapt 7 networks took in images to feed the convolutional input, it appears the net object is actually expecting some 32xminibatch sets. A little frustrating as all I want to do is to see the games play.

Shmuma commented 5 years ago

Looks like you're doing something wrong, in terms of wrappers applied. You shouldn't get LazyFrames as the state, as it should be other wrappers. But it's hard to tell without the full code.

I suggest you to copy Chapter06/{03_dqn_play.py + lib/wrappers.py} into Chapter07 and change the model construction (as you should use the model you've used from chapter07). Then it should word as expected.

Alternatively you could check the wrappers stack you have in your environment by printing it, it will show all the wrappers.

JTatts commented 5 years ago

Hi Jules,

Maybe you've solved this by now but I was stuck for a little while before I worked out the solution. I think Shmuma's solution is the easiest but there are two changes that you need to make to wrappers.py.

1) The network itself now does input normalisation so you should remove the ScaledFloatFrame from make_env.

2) In ImageToPyTorch moveaxis has been replaced by swapaxes (for efficiency???) so the form of the input array to the network has changed.

After these two changes everything works fine for me.