PacktPublishing / Deep-Reinforcement-Learning-Hands-On

Hands-on Deep Reinforcement Learning, published by Packt
MIT License
2.83k stars 1.28k forks source link

A3C grad_paralell gradients addition #34

Closed gowtham1997 closed 5 years ago

gowtham1997 commented 5 years ago

This is regarding line 140 of 02_a3c_grad.py where we are adding the gradients of different processes https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On/blob/71617994fbefcf7c432cb229f007942d3b0d450c/Chapter11/02_a3c_grad.py#L136-L141

Here even though tgt_grad += grad is done, this change is strangely not getting updated on the grad_buffer list.

I made a small snippet to test this:


               # creating a copy of the grad_buffering before its update
                old_grad_buffer = grad_buffer.copy()
                # a list to check whether elements of old_grad_buffer and
                # updated_grad_buffer are equal
                f = []
                # a new list to store the new added gradients in place
                new_grad_buffer = []
                for tgt_grad, grad in zip(grad_buffer, train_entry):
                    tgt_grad = tgt_grad + grad
                    # add the added gradients to new_grad_buffer
                    new_grad_buffer.append(tgt_grad)

                # comparing the updated grad_buffer and old_grad_buffer
                for tgt_grad, grad in zip(grad_buffer, old_grad_buffer):
                    f.append(np.array_equal(tgt_grad, grad))
                print(any(f))
                f.clear()
                # comparing the new_grad_buffer and old_grad_buffer
                for tgt_grad, grad in zip(old_grad_buffer, new_grad_buffer):
                    f.append(np.array_equal(tgt_grad, grad))
                print(any(f))
Outputs:
>> True
>> False

The above snippet produces True(updated grad_buffer and old_grad_buffer are equal) and False (for whether old_grad_buffer and new_grad_buffer are equal)

Something strange happens when i try to use += (a+=b) instead of adding normally (a = a + b).

                # creating a copy of the grad_buffering before its update
                old_grad_buffer = grad_buffer.copy()
                # a list to check whether elements of old_grad_buffer and
                # updated_grad_buffer are equal
                f = []
                # a new list to store the new added gradients in place
                new_grad_buffer = []
                for tgt_grad, grad in zip(grad_buffer, train_entry):
                    # the snippet above used tgt_grad = tgt_grad + grad
                    tgt_grad += grad
                    # add the added gradients to new_grad_buffer
                    new_grad_buffer.append(tgt_grad)

                # comparing the updated grad_buffer and old_grad_buffer
                for tgt_grad, grad in zip(grad_buffer, old_grad_buffer):
                    f.append(np.array_equal(tgt_grad, grad))
                print(any(f))
                f.clear()
                # comparing the new_grad_buffer and old_grad_buffer
                for tgt_grad, grad in zip(old_grad_buffer, new_grad_buffer):
                    f.append(np.array_equal(tgt_grad, grad))
                print(any(f))
                f.clear()
Outputs:
True
True

So I'm not sure why the first snippet works(even verified the gradient outputs and they are summed) while the second one doesn't.

@Shmuma Can you please take a look?