Chapter 11: 02_a3c_grad.py

Hi,

When we collect gradients in the gradient buffer between lines 136 and 140 what is the reason for the new tgt_grad variable.

For example why can we simply not replace this with,

        if grad_buffer is None:
            grad_buffer = train_entry
        else:
            grad_buffer += train_entry

Incidentally, with the original code I could not get convergence but with the above everything worked fine. (I only tried once so this could just be a lucky seed).

Cheers, Jamie

PacktPublishing / Deep-Reinforcement-Learning-Hands-On

Chapter 11: 02_a3c_grad.py #45