Hypergradient Calculation Different than from the paper?

On page 3 of the paper in algorithm 1 it states that that if (x_i, y_i) are not in the current batch, then that means we add (N/B)*g_t_i to the previous step's moment gradient scaled by the momentum and the previous step's hyper gradient scaled by the regularization co-efficient.

However, when I look at the HydraHook class I see that we include the instance gradient (N/B)*g_t_i if the index is part of the current batch and do not include the instance gradient if it is not. This seems opposite to what is suggested in Algorithm 1 and I hope you can help me figure what is going on.

Thanks!

cyyever / aaai_hydra

Hypergradient Calculation Different than from the paper? #2