Closed pavitrakumar78 closed 7 years ago
Hi,
I found a small bug in code that could cause some agents to get stuck (choosing always the same action) and therefore stop them from generating useful experience. So my guess is that in your case after a while some, if not all, agents just got stuck. I pushed a fix already. Tested it for 6m iterations with 16 generating agent and was able to achieve average score of 10 points and 30 points after 10m iterations.
I don't know the spec of p2.xlarge instance but ~33m frames in 20-22h is quite slow. On my machine (i5-6600K, GTX1080) training for 80m frames took about 23h total, so thats strange.
Answering your questions.
--processes
to number of cores on your machine.learn_proc
and specifically by LearningAgent (it counts frames used for learning, not number of batches). Each generate_experience_proc
should go slower.I hope this helps. If you have more questions feel free to ask. :)
Hi,
Thank you for your reply!
I guess the version of K80 that Aamazon uses has significantly lower clock speeds than a GTX1080 - that could probably explain why it's slow. But 80m frames in less than a day is blazing fast! Unfortunately I only have a GTX 660 on my desktop so, K80 is the only way to go for me at the moment.
Thanks for the fix! I will try it again with 16 agents and will let you know how it performs.
Both the machines I work with (PC and EC2 instance) have 4 core config only. I will also try running with 4 agents and report score later on.
Your answers really helped me! :)
Your model (the 91m frames) for breakout and all other tests were conducted using the default params in your code right? (i.e 4 process, lr, batch size etc..) - at least for a3c.
In my case GPU usage was about ~30-35% and CPU always 100%, so I think it should work on your desktop quite well.
Yes, I used default parameters for Breakout model and tests. :)
Hi,
Sorry for the late reply!
I have tried running the updated train.py for a3c and I am still getting the same results! :(
Just to confirm, my library versions are: Keras: 1.1.2 Theano: 0.8.2 numpy: 1.11.2 scipy: 0.18.1
I ran the exact same file (train.py) with the default parameters (4 agents). I added some logging function to it.
Here is the log info from the 0th actor thread (inside generate_experience_proc method): training_log_actor-0 (copy).txt
Here is the full log info from the learner thread: full_log.txt
I ran this on my PC (NVIDIA GTX 660), and it completed about 19m frames in 13-14 hours. Even after 19m frames the average is 1.XX! :( shouldn't it be at least 10?
Why is it that I am getting vastly lower performance with the same code in my computer?
Looks like something is wrong elsewhere. I suspect this (Your cuDNN version is more recent than the one Theano officially supports. If you see any problems, try updating Theano or downgrading cuDNN to version 5.
) Theano warning might have something to do with it.
Please do this four things:
self.train_net.summary()
)Breakout-v0
(your entropy is different from mine (in my case it was ~0.36), which suggests you have more actions (I have four))pip install --upgrade --no-deps git+git://github.com/Theano/Theano.git
)After upgrading Theano to bleeding edge, run some simple convolutional network on mnist to confirm its learning.
My library versions: Keras: 1.1.1 Theano: 0.9.0.dev3 numpy: 1.11.2 scipy: 0.18.1
I hope we will figure it out soon. :)
Hi,
Thank you for the suggestions!
Network summary
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
input_1 (InputLayer) (None, 3, 84, 84) 0
____________________________________________________________________________________________________
convolution2d_1 (Convolution2D) (None, 16, 20, 20) 3088 input_1[0][0]
____________________________________________________________________________________________________
convolution2d_2 (Convolution2D) (None, 32, 9, 9) 8224 convolution2d_1[0][0]
____________________________________________________________________________________________________
flatten_1 (Flatten) (None, 2592) 0 convolution2d_2[0][0]
____________________________________________________________________________________________________
dense_1 (Dense) (None, 256) 663808 flatten_1[0][0]
____________________________________________________________________________________________________
value (Dense) (None, 1) 257 dense_1[0][0]
____________________________________________________________________________________________________
policy (Dense) (None, 6) 1542 dense_1[0][0]
====================================================================================================
Total params: 676919
____________________________________________________________________________________________________
None
-My action set is set as default i.e env.action_space so I will set it as:
if args.game == 'Breakout-v0' :
action_space = Discrete(4)
(env.action_space for breakout returns Discrete(6))
- Just upgraded it.
I will run the theano checks.
Edit: Just ran the theano checks.
Theano version: 0.9.0dev4
Using the cnn code from [here](http://deeplearning.net/tutorial/lenet.html) running on mnist:
http://pastebin.com/x7wrau79
I will train for 15-20m steps with the changes I've made and let you know the results!
About the theano warning, I have been testing another implementation of original DQN using lasagne (which uses only theano backend) - and I had no problems learning breakout and other games using it.
Don't change action_space
because it may break something in gym. You can try using env.get_action_meanings()
to see what actions are available to agent (my case: ['NOOP', 'FIRE', 'RIGHT', 'LEFT']
) and post it here.
Model architecture and file seems to be ok.
To make sure its not a problem with Keras try:
from keras.models import *
from keras.layers import *
from keras.datasets import mnist
x = Input(shape=(28, 28))
h = Reshape((1, 28, 28))(x)
h = Convolution2D(8, 3, 3, activation='relu')(h)
h = Convolution2D(8, 3, 3, activation='relu')(h)
h = Convolution2D(8, 3, 3, activation='relu')(h)
h = Flatten()(h)
y = Dense(10, activation='softmax')(h)
model = Model(x, y)
model.compile('rmsprop', 'sparse_categorical_crossentropy', metrics=['accuracy'])
(train_x, train_y), _ = mnist.load_data()
model.fit(train_x, train_y, nb_epoch=1)
Result:
Epoch 1/1
60000/60000 [==============================] - 4s - loss: 0.2332 - acc: 0.9416
I guess you should have about ~95% accuracy as well.
The action meanings in gym has always been like this for me: (atleast for breakout)
['NOOP', 'FIRE', 'RIGHT', 'LEFT', 'RIGHTFIRE', 'LEFTFIRE']
So, in most of my experiments, I had to correct it to 4 or 3.
I installed gym using pip install 'gym[all]'
and gym version is gym-0.7.1
. How come your gym only returns 4 actions?
Also, there was a problem with the keras code.
Python 2.7.6 (default, Oct 26 2016, 20:30:19)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from keras.models import *
Using Theano backend.
Using gpu device 0: GeForce GTX 660 (CNMeM is disabled, cuDNN 5105)
>>> from keras.layers import *
>>> from keras.datasets import mnist
>>> x = Input(shape=(28, 28))
>>> h = Reshape((1, 28, 28))(x)
>>> h = Convolution2D(8, 3, 3, activation='relu')(h)
>>> h = Convolution2D(8, 3, 3, activation='relu')(h)
>>> h = Convolution2D(8, 3, 3, activation='relu')(h)
>>> h = Flatten()(h)
>>> y = Dense(10, activation='softmax')(h)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 491, in __call__
self.build(input_shapes[0])
File "/usr/local/lib/python2.7/dist-packages/keras/layers/core.py", line 727, in build
name='{}_W'.format(self.name))
File "/usr/local/lib/python2.7/dist-packages/keras/initializations.py", line 60, in glorot_uniform
return uniform(shape, s, name=name)
File "/usr/local/lib/python2.7/dist-packages/keras/initializations.py", line 33, in uniform
return K.random_uniform_variable(shape, -scale, scale, name=name)
File "/usr/local/lib/python2.7/dist-packages/keras/backend/theano_backend.py", line 141, in random_uniform_variable
return variable(np.random.uniform(low=low, high=high, size=shape),
File "mtrand.pyx", line 1565, in mtrand.RandomState.uniform (numpy/random/mtrand/mtrand.c:17319)
OverflowError: Range exceeds valid bounds
But after referencing this, I added:
from keras import backend as K
K.set_image_dim_ordering('th')
and I was able to execute the code.
Result:
60000/60000 [==============================] - 9s - loss: 0.3006 - acc: 0.9402
Maybe its because of different roms and/or version of gym (I'm using 0.5.6). Or maybe because right now I'm on Windows and I had to compile ALE and use lots of workarounds to make atari_py work at all. :)
So I think we found main issue. You can either use:
from keras import backend as K
K.set_image_dim_ordering('th')
at the begining of generate_experience_proc
and learning_proc
or set it permanently in .keras/keras.json
with "image_dim_ordering": "th"
if I'm correct.
Give it a try and let me know if that solves the issue. :)
Yes. I will give it a try and let you know!
The 4 actions that you get seems a bit weird tho. If you are using atari-py, it should return 6 actions as per the code here, but if you have compiled using the original ALE here - it has 4 actions as you have got. Maybe I need to update my .so files :) Edit: It is already an issue. Thanks for helping to solve the problem! :)
Thats interesting. I wasn't aware of the difference between atari-py and ALE. I sort of compiled ALE and copied it to atari-py to make it work on Windows. :)
Hi,
I just let it run for another 14m epochs, but again I have the same problem! :( The average is still 1.XX and the image ordering does not seem to be the problem here. Model after 14m frames: model-Breakout-v0-14000000.h5.zip Score avg of actor 0: training_log_actor-0.txt Code I used for training: train_updated_new -code.txt Unfortunately I could not get the terminal output about loss/entrophy/etc. I closed the terminal before I could copy it. So, I resumed from the 14m frames checkpoint and let it run for a few mins. Here is the output:
Frames: 14005000; Policy-Loss: -0.427857; Avg: -0.019142 --- Value-Loss: 0.000391; Avg: 0.199642 --- Entropy: 0.346546; Avg: 0.346539 --- V-value; Min: 0Frames: 14010000; Policy-Loss: -0.450229; Avg: -0.013545 --- Value-Loss: 0.000488; Avg: 0.174907 --- Entropy: 0.346533; Avg: 0.346537 --- V-value; Min: 0
Frames: 14015000; Policy-Loss: -0.478949; Avg: -0.012407 --- Value-Loss: 0.000597; Avg: 0.176477 --- Entropy: 0.346534; Avg: 0.346531 --- V-value; Min: 0.210; Max: 0.218; Avg: 0.213 7310> Best: 3; Avg: 1.00; Max: 3
7318> Best: 3; Avg: 1.05; Max: 3
Frames: 14020000; Policy-Loss: -1.874325; Avg: 0.470858 --- Value-Loss: 0.105481; Avg: 0.346848 --- Entropy: 0.346542; Avg: 0.346538 --- V-value; Min: 0.208; Max: 0.215; Avg: 0.211 7315> Best: 4; Avg: 1.24; Max: 4
7312> Best: 5; Avg: 1.35; Max: 5
Frames: 14025000; Policy-Loss: -0.457242; Avg: 0.031370 --- Value-Loss: 0.000610; Avg: 0.175511 --- Entropy: 0.346525; Avg: 0.346534 --- V-value; Min: 0.211; Max: 0.216; Avg: 0.214
At time point, it would be really helpful if somone else could run it on their comp and check it!
Another question.. You have only tested your code on python 3.x? I am testing it only 2.7 so maybe I should include the from __future__ import division
?
Edit: I will run again with that change. Even though I don't think it makes any difference, no harm in testing it out.
Edit 2: Good news! I am seeing some learning after about 1.25m epochs. I see that average score is now 2.XX (which I never saw in any of my previous runs) so I guess it was a python version compatibility issue! i.e. without the division, the lr formula would have defaulted to 0.000001 since the other param in the max always evaluates to 0 without future's division import. (1/2 evaluates as 0 on python 2.7) Anyways, I will let it run for atleast 10m epochs and let you know about the results.
Hi,
Thought I would make a separate post for this.
I have trained for 10m frames and the average is now at 14-16! :)
Full log: train_updated_new_v2.py full_log.txt Actor-0 score averages: training_log_actor-0.txt 10m train steps model: model-Breakout-v0-10000000.h5.zip Running test of 10 games after 10m frames:
Game # 1; Reward 18;
Game # 2; Reward 15;
Game # 3; Reward 20;
Game # 4; Reward 18;
Game # 5; Reward 18;
Game # 6; Reward 15;
Game # 7; Reward 30;
Game # 8; Reward 25;
Game # 9; Reward 10;
Game # 10; Reward 17;
Note: The above results were obtained using the code with a modified move space of 4 (which are in the pull request) I have created a few pull requests to make the code compatible with both python 2.x and 3.x and a few other changes to account for breakout's move space. Review them if you can! :)
Thank you for helping to resolve this issue! :)
I'm glad it's finally working for you! :)
Thank you guys. The future division import helped me on Python 2.7 as well.
I can also confirm the action_space=4 vs 6 makes not a difference, both have good results.
Tested on Keras_V2, gpu.cuda8.cudnn5110 Ubuntu 16.04 and OSX 10.12.3.
Hi,
I am trying to train a breakout game model using your batch-a3c code. I am using a p2.xlarge instance (NVIDIA K80) and it has training been about 20-22 hours and it has completed about ~33m frames. From your graph, assuming each unit in x-axis is 10m training steps, at about 30-25m steps, it has achieved an average score of atleast 30-50. But my model unfortunately barely gets a 10 - its mostly still zeros. I am not sure why this happens.
I am using the same 80m steps and all the default params are the same.
These are the statistics at about ~34m frames. Frames: 34300000; Policy_loss: -0.701535; Value_loss: 0.002712; Entrophy: 0.345996; V-value_avg: 0.439
Also, I should mention that I am using 16 acting agents so I believe the performance should be much better, correct? Other doubts that I have are: -Would it help to increase the queue length as the number of agents increase wouldn't this diversify the different experiences the agent can learn?
-During my training, I paused and resumed training once (at 6m steps) - since most of the code is dependent of the network weights and the number of train steps (obtained from the input args) - this shouldn't have had any effect on the training right?
-The total training steps is the one counted in the learn_proc() method right? each actor process's train steps progress much slower than the learn_proc() thread - is this normal?
-There is no stopping condition, so I am assuming the training ran for 80m steps as indicated by the message printed out by LearningAgent's learn() method.
I will let the training continue for another 10m steps and see if there are any performance changes.
Thank you!