I am recently working on the series of work of PRN, CSPNet and ELAN. They are tightly related but I find it quite esoteric to grasp the insights of this series of work. I searched the entire community but did not help. Here is a (fundamental) question I have:
the concept of gradient timestamp and source in PRN. The definite concept of them does not show up. My question is: Why should we care about the timestamp of gradient? During each iteration, all we want is the final gradient with respect to a learnable parameter. If the goal of PRN is to increase the diversity of gradient combination, maybe a formula which incorporates the explicit form of gradient is much more lucid and reader-friendly?
I am recently working on the series of work of PRN, CSPNet and ELAN. They are tightly related but I find it quite esoteric to grasp the insights of this series of work. I searched the entire community but did not help. Here is a (fundamental) question I have:
@WongKinYiu