MG2033 / A2C

A Clearer and Simpler Synchronous Advantage Actor Critic (A2C) Implementation in TensorFlow
Apache License 2.0
183 stars 37 forks source link

Model doesn't make use of the GPU #4

Closed ahsteven closed 6 years ago

ahsteven commented 6 years ago

I have started training the model on breakout and it is a little slow. It is only using around 500 MB of the GPU. Even when increasing the number of environments to 20 the use of the GPU is the same. I think this may be the reason openAI coded their model the way they did. It uses around 7GB at least for the ACER model. I need to check for A2C.

MG2033 commented 6 years ago

One of the differences between this code and OpenAI's is these lines:

config = tf.ConfigProto(allow_soft_placement=True, intra_op_parallelism_threads=config_args.num_envs, inter_op_parallelism_threads=config_args.num_envs) config.gpu_options.allow_growth = True

These lines allow to take only the strict amount of GPU VRAM needed to run the model. So, if you increase the number of environments, this amount will not increase because the train model is just 1 model and the step model is the same allocated model on GPU (as reuse=true).

Have you compared the time for convergence between my code and theirs? I think they should be the same.

ahsteven commented 6 years ago

Well I trained the model for 2000000 iterations which as taken several days on a GTX 1070. When I run the test code to view the performance it still looks like a random agent.

How do you recommend comparing the time for convergence?

I have not been able to figure out how to view the results of the openai version.

MG2033 commented 6 years ago

I can't understand why you are having the problem of not convergence. I've trained the same model with the same parameters on pong and breakout and it converged properly. On breakout, it got a score of 800s and on pong it got 19 approximately. Could you please give me a screenshot of the TensorBoard visualization? It's saved automatically in the experiments directory.

ahsteven commented 6 years ago

Ok, something strange is happening. Here is the output of tensorboard. I wonder if hitting control Z is throwing things off. Now when I just run the test code creating the mpeg video the performance is really bad. However from the tensorboard it looks like the performace would have been good at some points.

On Tue, Mar 20, 2018 at 8:34 AM, Mostafa Gamal notifications@github.com wrote:

I can't understand why you are having the problem of not convergence. I've trained the same model with the same parameters on pong and breakout and it converged properly. On breakout, it got a score of 800s and on pong it got 19 approximately. Could you please give me a screenshot of the TensorBoard visualization? It's saved automatically in the experiments directory.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/MG2033/A2C/issues/4#issuecomment-374581322, or mute the thread https://github.com/notifications/unsubscribe-auth/AJsMv1Wo9xwqxUgSLYOycBRi8AuXBLcHks5tgPdPgaJpZM4SvBQ3 .

MG2033 commented 6 years ago

Where is the output? it's not visible. If you want to pause the code to resume it later, press CTRL+C not CTRL+Z.

ahsteven commented 6 years ago

selection_002

MG2033 commented 6 years ago

This is very strange and very rare to happen. The model suddenly drops in performance at the same time you would like to measure the performance which is already bad. The drop is unexplainable to me as I only tested the model with max. 4 parallel agents.

My advice is that don't interrupt the code during training except using CTRL+C which allows saving a checkpoint.

ahsteven commented 6 years ago

So I started training again with 4 environments. I used ctl-C to stop training. I get an error which might indicate a problem. After this the tensorboard reward drops to zero after reloading checkpoint.

Do you see similar after using ctl-C

Saving model...
Process Process-2:
Process Process-3:
Process Process-1:
Process Process-4:
Traceback (most recent call last):
  File "/home/teves/anaconda3/envs/gym/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/home/teves/anaconda3/envs/gym/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
Traceback (most recent call last):
  File "/home/teves/A2C/envs/subproc_vec_env.py", line 10, in worker
    cmd, data = remote.recv()
  File "/home/teves/anaconda3/envs/gym/lib/python3.5/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/home/teves/anaconda3/envs/gym/lib/python3.5/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/home/teves/anaconda3/envs/gym/lib/python3.5/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
  File "/home/teves/anaconda3/envs/gym/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/home/teves/anaconda3/envs/gym/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
KeyboardInterrupt
  File "/home/teves/A2C/envs/subproc_vec_env.py", line 10, in worker
    cmd, data = remote.recv()
  File "/home/teves/anaconda3/envs/gym/lib/python3.5/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/home/teves/anaconda3/envs/gym/lib/python3.5/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/home/teves/anaconda3/envs/gym/lib/python3.5/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
KeyboardInterrupt
Traceback (most recent call last):
Traceback (most recent call last):
  File "/home/teves/anaconda3/envs/gym/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/home/teves/anaconda3/envs/gym/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/teves/A2C/envs/subproc_vec_env.py", line 12, in worker
    ob, reward, done, info = env.step(data)
  File "/home/teves/A2C/envs/gym_env.py", line 24, in step
    observation, reward, done, info = self.env.step(data)
  File "/home/teves/gym/gym/core.py", line 330, in step
    observation, reward, done, info = self.env.step(action)
  File "/home/teves/gym/gym/core.py", line 314, in step
    return self.observation(observation), reward, done, info
  File "/home/teves/anaconda3/envs/gym/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/home/teves/gym/gym/core.py", line 322, in observation
    return self._observation(observation)
  File "/home/teves/anaconda3/envs/gym/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/teves/A2C/envs/atari_wrappers.py", line 128, in _observation
    frame = np.dot(obs.astype('float32'), np.array([0.299, 0.587, 0.114], 'float32'))
  File "/home/teves/A2C/envs/subproc_vec_env.py", line 12, in worker
    ob, reward, done, info = env.step(data)
  File "/home/teves/A2C/envs/gym_env.py", line 24, in step
    observation, reward, done, info = self.env.step(data)
  File "/home/teves/gym/gym/core.py", line 330, in step
    observation, reward, done, info = self.env.step(action)
KeyboardInterrupt
  File "/home/teves/gym/gym/core.py", line 314, in step
    return self.observation(observation), reward, done, info
  File "/home/teves/gym/gym/core.py", line 322, in observation
    return self._observation(observation)
  File "/home/teves/A2C/envs/atari_wrappers.py", line 128, in _observation
    frame = np.dot(obs.astype('float32'), np.array([0.299, 0.587, 0.114], 'float32'))
KeyboardInterrupt
Model saved

image

MG2033 commented 6 years ago

The interrupt is perfectly fine. However, because it starts from zero again after interrupting it, that gives us a lead. I'll run the training myself again and tell you what I got.

MG2033 commented 6 years ago

There was a bug in the model saver. I have fixed it. You should be able to run properly now. Thank you for pointing that out.

If there are any more issues. Feel free to tell me.

ahsteven commented 6 years ago

Hi Mostapha, I trained for 1 million itterations and the results were good. However, it through an error when it went into the testing loop immediately following training. I can restart the program and just run the testing just fine.

100%|██████████████████████████████| 1000000/1000000 [10:37:08<00:00, 26.16it/s]Iteration:1000000 - loss: 0.000770 - policy_entropy: 1.212680 - fps: 2092.0 Saving model... Model saved 1000001it [10:37:08, 26.16it/s]
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.uint8'>. Please provide explicit dtype. Traceback (most recent call last): File "main.py", line 32, in main() File "main.py", line 28, in main a2c.test(total_timesteps=10000) File "/home/teves/A2C/A2C.py", line 56, in test self.model.build(observation_space_shape, action_space_n) File "/home/teves/A2C/models/model.py", line 97, in build self.init_network() File "/home/teves/A2C/models/model.py", line 67, in init_network is_training=False) File "/home/teves/A2C/models/cnn_policy.py", line 18, in init is_training=is_training) File "/home/teves/A2C/layers.py", line 132, in conv2d initializer=initializer, l2_strength=l2_strength, bias=bias) File "/home/teves/A2C/layers.py", line 29, in conv2d_p w = variable_with_weight_decay(kernel_shape, initializer, l2_strength) File "/home/teves/A2C/layers.py", line 384, in variable_with_weight_decay w = tf.get_variable('weights', kernel_shape, tf.float32, initializer=initializer) File "/home/teves/anaconda3/envs/gym/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 1262, in get_variable constraint=constraint) File "/home/teves/anaconda3/envs/gym/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 1097, in get_variable constraint=constraint) File "/home/teves/anaconda3/envs/gym/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 435, in get_variable constraint=constraint) File "/home/teves/anaconda3/envs/gym/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 404, in _true_getter use_resource=use_resource, constraint=constraint) File "/home/teves/anaconda3/envs/gym/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 743, in _get_single_variable name, "".join(traceback.format_list(tb)))) ValueError: Variable policy/conv1/weights already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:

File "/home/teves/A2C/layers.py", line 384, in variable_with_weight_decay w = tf.get_variable('weights', kernel_shape, tf.float32, initializer=initializer) File "/home/teves/A2C/layers.py", line 29, in conv2d_p w = variable_with_weight_decay(kernel_shape, initializer, l2_strength) File "/home/teves/A2C/layers.py", line 132, in conv2d initializer=initializer, l2_strength=l2_strength, bias=bias)

MG2033 commented 6 years ago

Ignore that error for now. This is because I made the train() and the test() standalone. So, it finds the model preallocated in the GPU. This is fine :D

ahsteven commented 6 years ago

Hi Mostafa,

I noticed that the video I created with the trained model generates a score of over 600. What is concerning is that the speed of the block never seems to increase and once all the blocks have been broken it just continues to hit around the ball. There seems to be no change of level or reset after the blocks are broken. Is this normal behavior?

On Wed, Mar 21, 2018 at 9:48 AM, Mostafa Gamal notifications@github.com wrote:

Ignore that error for now. This is because I made the train() and the test() standalone. So, it finds the model preallocated in the GPU. This is fine :D

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/MG2033/A2C/issues/4#issuecomment-374942414, or mute the thread https://github.com/notifications/unsubscribe-auth/AJsMv28ub_grb06FOOrtMu1__TYqKykOks5tgloogaJpZM4SvBQ3 .

MG2033 commented 6 years ago

I think it's normal. This behavior will be from OpenAI gym atari environment itself.

ahsteven commented 6 years ago

Thanks for all your work on this and answering my questions. Your work is extremely helpful.

On Wed, Mar 21, 2018 at 10:49 AM, Mostafa Gamal notifications@github.com wrote:

I think it's normal. This behavior will be from OpenAI gym atari environment itself.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/MG2033/A2C/issues/4#issuecomment-374964066, or mute the thread https://github.com/notifications/unsubscribe-auth/AJsMv-9z7l8HSDGQMGYjzyDTawzhPE5Rks5tgmiAgaJpZM4SvBQ3 .

MG2033 commented 6 years ago

You're welcome.