Here even though tgt_grad += grad is done, this change is strangely not getting updated on the grad_buffer list.
I made a small snippet to test this:
# creating a copy of the grad_buffering before its update
old_grad_buffer = grad_buffer.copy()
# a list to check whether elements of old_grad_buffer and
# updated_grad_buffer are equal
f = []
# a new list to store the new added gradients in place
new_grad_buffer = []
for tgt_grad, grad in zip(grad_buffer, train_entry):
tgt_grad = tgt_grad + grad
# add the added gradients to new_grad_buffer
new_grad_buffer.append(tgt_grad)
# comparing the updated grad_buffer and old_grad_buffer
for tgt_grad, grad in zip(grad_buffer, old_grad_buffer):
f.append(np.array_equal(tgt_grad, grad))
print(any(f))
f.clear()
# comparing the new_grad_buffer and old_grad_buffer
for tgt_grad, grad in zip(old_grad_buffer, new_grad_buffer):
f.append(np.array_equal(tgt_grad, grad))
print(any(f))
Outputs:
>> True
>> False
The above snippet produces True(updated grad_buffer and old_grad_buffer are equal) and False (for whether old_grad_buffer and new_grad_buffer are equal)
Something strange happens when i try to use += (a+=b) instead of adding normally (a = a + b).
# creating a copy of the grad_buffering before its update
old_grad_buffer = grad_buffer.copy()
# a list to check whether elements of old_grad_buffer and
# updated_grad_buffer are equal
f = []
# a new list to store the new added gradients in place
new_grad_buffer = []
for tgt_grad, grad in zip(grad_buffer, train_entry):
# the snippet above used tgt_grad = tgt_grad + grad
tgt_grad += grad
# add the added gradients to new_grad_buffer
new_grad_buffer.append(tgt_grad)
# comparing the updated grad_buffer and old_grad_buffer
for tgt_grad, grad in zip(grad_buffer, old_grad_buffer):
f.append(np.array_equal(tgt_grad, grad))
print(any(f))
f.clear()
# comparing the new_grad_buffer and old_grad_buffer
for tgt_grad, grad in zip(old_grad_buffer, new_grad_buffer):
f.append(np.array_equal(tgt_grad, grad))
print(any(f))
f.clear()
Outputs:
True
True
So I'm not sure why the first snippet works(even verified the gradient outputs and they are summed) while the second one doesn't.
This is regarding line 140 of 02_a3c_grad.py where we are adding the gradients of different processes https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On/blob/71617994fbefcf7c432cb229f007942d3b0d450c/Chapter11/02_a3c_grad.py#L136-L141
Here even though
tgt_grad += grad
is done, this change is strangely not getting updated on thegrad_buffer
list.I made a small snippet to test this:
The above snippet produces True(updated grad_buffer and old_grad_buffer are equal) and False (for whether old_grad_buffer and new_grad_buffer are equal)
Something strange happens when i try to use += (a+=b) instead of adding normally (a = a + b).
So I'm not sure why the first snippet works(even verified the gradient outputs and they are summed) while the second one doesn't.
@Shmuma Can you please take a look?