cyyever / aaai_hydra

17 stars 1 forks source link

Hypergradient Calculation Different than from the paper? #2

Open JohnnyC08 opened 2 years ago

JohnnyC08 commented 2 years ago

On page 3 of the paper in algorithm 1 it states that that if (x_i, y_i) are not in the current batch, then that means we add (N/B)*g_t_i to the previous step's moment gradient scaled by the momentum and the previous step's hyper gradient scaled by the regularization co-efficient.

However, when I look at the HydraHook class I see that we include the instance gradient (N/B)*g_t_i if the index is part of the current batch and do not include the instance gradient if it is not. This seems opposite to what is suggested in Algorithm 1 and I hope you can help me figure what is going on.

Thanks!

cyyever commented 9 months ago

@JohnnyC08 There is an error in the pseudo code. The condition should be if (x_i, y_i) are in the current batch. Just noticed this issue.