Closed ahsteven closed 6 years ago
One of the differences between this code and OpenAI's is these lines:
config = tf.ConfigProto(allow_soft_placement=True, intra_op_parallelism_threads=config_args.num_envs, inter_op_parallelism_threads=config_args.num_envs) config.gpu_options.allow_growth = True
These lines allow to take only the strict amount of GPU VRAM needed to run the model. So, if you increase the number of environments, this amount will not increase because the train model is just 1 model and the step model is the same allocated model on GPU (as reuse=true).
Have you compared the time for convergence between my code and theirs? I think they should be the same.
Well I trained the model for 2000000 iterations which as taken several days on a GTX 1070. When I run the test code to view the performance it still looks like a random agent.
How do you recommend comparing the time for convergence?
I have not been able to figure out how to view the results of the openai version.
I can't understand why you are having the problem of not convergence. I've trained the same model with the same parameters on pong and breakout and it converged properly. On breakout, it got a score of 800s and on pong it got 19 approximately. Could you please give me a screenshot of the TensorBoard visualization? It's saved automatically in the experiments directory.
Ok, something strange is happening. Here is the output of tensorboard. I wonder if hitting control Z is throwing things off. Now when I just run the test code creating the mpeg video the performance is really bad. However from the tensorboard it looks like the performace would have been good at some points.
On Tue, Mar 20, 2018 at 8:34 AM, Mostafa Gamal notifications@github.com wrote:
I can't understand why you are having the problem of not convergence. I've trained the same model with the same parameters on pong and breakout and it converged properly. On breakout, it got a score of 800s and on pong it got 19 approximately. Could you please give me a screenshot of the TensorBoard visualization? It's saved automatically in the experiments directory.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/MG2033/A2C/issues/4#issuecomment-374581322, or mute the thread https://github.com/notifications/unsubscribe-auth/AJsMv1Wo9xwqxUgSLYOycBRi8AuXBLcHks5tgPdPgaJpZM4SvBQ3 .
Where is the output? it's not visible. If you want to pause the code to resume it later, press CTRL+C not CTRL+Z.
This is very strange and very rare to happen. The model suddenly drops in performance at the same time you would like to measure the performance which is already bad. The drop is unexplainable to me as I only tested the model with max. 4 parallel agents.
My advice is that don't interrupt the code during training except using CTRL+C which allows saving a checkpoint.
So I started training again with 4 environments. I used ctl-C to stop training. I get an error which might indicate a problem. After this the tensorboard reward drops to zero after reloading checkpoint.
Do you see similar after using ctl-C
Saving model...
Process Process-2:
Process Process-3:
Process Process-1:
Process Process-4:
Traceback (most recent call last):
File "/home/teves/anaconda3/envs/gym/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/home/teves/anaconda3/envs/gym/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
Traceback (most recent call last):
File "/home/teves/A2C/envs/subproc_vec_env.py", line 10, in worker
cmd, data = remote.recv()
File "/home/teves/anaconda3/envs/gym/lib/python3.5/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/home/teves/anaconda3/envs/gym/lib/python3.5/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/home/teves/anaconda3/envs/gym/lib/python3.5/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
File "/home/teves/anaconda3/envs/gym/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/home/teves/anaconda3/envs/gym/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
KeyboardInterrupt
File "/home/teves/A2C/envs/subproc_vec_env.py", line 10, in worker
cmd, data = remote.recv()
File "/home/teves/anaconda3/envs/gym/lib/python3.5/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/home/teves/anaconda3/envs/gym/lib/python3.5/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/home/teves/anaconda3/envs/gym/lib/python3.5/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
KeyboardInterrupt
Traceback (most recent call last):
Traceback (most recent call last):
File "/home/teves/anaconda3/envs/gym/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/home/teves/anaconda3/envs/gym/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/teves/A2C/envs/subproc_vec_env.py", line 12, in worker
ob, reward, done, info = env.step(data)
File "/home/teves/A2C/envs/gym_env.py", line 24, in step
observation, reward, done, info = self.env.step(data)
File "/home/teves/gym/gym/core.py", line 330, in step
observation, reward, done, info = self.env.step(action)
File "/home/teves/gym/gym/core.py", line 314, in step
return self.observation(observation), reward, done, info
File "/home/teves/anaconda3/envs/gym/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/home/teves/gym/gym/core.py", line 322, in observation
return self._observation(observation)
File "/home/teves/anaconda3/envs/gym/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/teves/A2C/envs/atari_wrappers.py", line 128, in _observation
frame = np.dot(obs.astype('float32'), np.array([0.299, 0.587, 0.114], 'float32'))
File "/home/teves/A2C/envs/subproc_vec_env.py", line 12, in worker
ob, reward, done, info = env.step(data)
File "/home/teves/A2C/envs/gym_env.py", line 24, in step
observation, reward, done, info = self.env.step(data)
File "/home/teves/gym/gym/core.py", line 330, in step
observation, reward, done, info = self.env.step(action)
KeyboardInterrupt
File "/home/teves/gym/gym/core.py", line 314, in step
return self.observation(observation), reward, done, info
File "/home/teves/gym/gym/core.py", line 322, in observation
return self._observation(observation)
File "/home/teves/A2C/envs/atari_wrappers.py", line 128, in _observation
frame = np.dot(obs.astype('float32'), np.array([0.299, 0.587, 0.114], 'float32'))
KeyboardInterrupt
Model saved
The interrupt is perfectly fine. However, because it starts from zero again after interrupting it, that gives us a lead. I'll run the training myself again and tell you what I got.
There was a bug in the model saver. I have fixed it. You should be able to run properly now. Thank you for pointing that out.
If there are any more issues. Feel free to tell me.
Hi Mostapha, I trained for 1 million itterations and the results were good. However, it through an error when it went into the testing loop immediately following training. I can restart the program and just run the testing just fine.
100%|██████████████████████████████| 1000000/1000000 [10:37:08<00:00, 26.16it/s]Iteration:1000000 - loss: 0.000770 - policy_entropy: 1.212680 - fps: 2092.0
Saving model...
Model saved
1000001it [10:37:08, 26.16it/s]
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.uint8'>. Please provide explicit dtype.
Traceback (most recent call last):
File "main.py", line 32, in
File "/home/teves/A2C/layers.py", line 384, in variable_with_weight_decay w = tf.get_variable('weights', kernel_shape, tf.float32, initializer=initializer) File "/home/teves/A2C/layers.py", line 29, in conv2d_p w = variable_with_weight_decay(kernel_shape, initializer, l2_strength) File "/home/teves/A2C/layers.py", line 132, in conv2d initializer=initializer, l2_strength=l2_strength, bias=bias)
Ignore that error for now. This is because I made the train() and the test() standalone. So, it finds the model preallocated in the GPU. This is fine :D
Hi Mostafa,
I noticed that the video I created with the trained model generates a score of over 600. What is concerning is that the speed of the block never seems to increase and once all the blocks have been broken it just continues to hit around the ball. There seems to be no change of level or reset after the blocks are broken. Is this normal behavior?
On Wed, Mar 21, 2018 at 9:48 AM, Mostafa Gamal notifications@github.com wrote:
Ignore that error for now. This is because I made the train() and the test() standalone. So, it finds the model preallocated in the GPU. This is fine :D
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/MG2033/A2C/issues/4#issuecomment-374942414, or mute the thread https://github.com/notifications/unsubscribe-auth/AJsMv28ub_grb06FOOrtMu1__TYqKykOks5tgloogaJpZM4SvBQ3 .
I think it's normal. This behavior will be from OpenAI gym atari environment itself.
Thanks for all your work on this and answering my questions. Your work is extremely helpful.
On Wed, Mar 21, 2018 at 10:49 AM, Mostafa Gamal notifications@github.com wrote:
I think it's normal. This behavior will be from OpenAI gym atari environment itself.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/MG2033/A2C/issues/4#issuecomment-374964066, or mute the thread https://github.com/notifications/unsubscribe-auth/AJsMv-9z7l8HSDGQMGYjzyDTawzhPE5Rks5tgmiAgaJpZM4SvBQ3 .
You're welcome.
I have started training the model on breakout and it is a little slow. It is only using around 500 MB of the GPU. Even when increasing the number of environments to 20 the use of the GPU is the same. I think this may be the reason openAI coded their model the way they did. It uses around 7GB at least for the ACER model. I need to check for A2C.