PacktPublishing / Deep-Reinforcement-Learning-Hands-On

Hands-on Deep Reinforcement Learning, published by Packt
MIT License
2.83k stars 1.28k forks source link

Chapter 11: 02_a3c_grad.py #45

Open JTatts opened 5 years ago

JTatts commented 5 years ago

Hi,

When we collect gradients in the gradient buffer between lines 136 and 140 what is the reason for the new tgt_grad variable.

For example why can we simply not replace this with,

        if grad_buffer is None:
            grad_buffer = train_entry
        else:
            grad_buffer += train_entry

Incidentally, with the original code I could not get convergence but with the above everything worked fine. (I only tried once so this could just be a lucky seed).

Cheers, Jamie