When we collect gradients in the gradient buffer between lines 136 and 140 what is the reason for the new tgt_grad variable.
For example why can we simply not replace this with,
if grad_buffer is None:
grad_buffer = train_entry
else:
grad_buffer += train_entry
Incidentally, with the original code I could not get convergence but with the above everything worked fine. (I only tried once so this could just be a lucky seed).
Hi,
When we collect gradients in the gradient buffer between lines 136 and 140 what is the reason for the new tgt_grad variable.
For example why can we simply not replace this with,
Incidentally, with the original code I could not get convergence but with the above everything worked fine. (I only tried once so this could just be a lucky seed).
Cheers, Jamie