Closed alextzik closed 2 years ago
Sorry, I don't see the issue. The forms are the same - one can simply shift the index on each side by one. @mykelk what do you advise?
Hello, I do not feel the two forms are exactly the same, because they are both using a^(k) and one is computing x^(k+1), while the other one is computing x^(k).
Ah, I see. This was a tricky one! I updated the equations for hypergradient descent. This should both match the code and be consistent with Eq 4.1.
Great! Thanks!
A reference is made to eq. (4.1), which states x^(k+1) = x^(k)+a^(k)d^(k). However, the equations of pg. 82 make sense for x^(k) = x^(k-1)+a^(k)d^(k-1): We have x^(k), calculate df(x^(k))/da = df/dx(@x^(k))(-g^(k-1)) and compute x^(k+1) =x^(k)+a^(k+1)d^(k), where a^(k+1) is given by eq. (5.37).