Questions about the update of theta

YinghuaGao commented 4 years ago

Dear authors,

Thanks for your nice work, which inspires me a lot. But I'm confused with some details. In heng_mnist-main.py,

0805

I'm confused why you choose the largest index in cur_acc. According to the paper, I think it should be something like "np.sum(cur_acc*hypgrad)/args.n_samples ". And if you choose the largest item in curr_acc, maybe "hypgrad=hypgrad/args.n_samples" is not necessary.

Anyway, Thanks for releasing code !!!

Best, Yinghua

jerermyyoung commented 4 years ago

Hi Yinghua, thanks for your interest in our code!

For your question, yes in original stochastic relaxation formulation, we have to sum over all elements in cur_acc. E.g. for hypergrad, we should have something like:

hypgrad=np.zeros(...) for iii in range(args.n_samples): hypergrad+=loggrad[iii]*cur_acc[iii] hypgrad=hypgrad/args.n_samples

And in our code, we made a little modification borrowed from [1]. This modification is to "normalize" by its order across cur_acc: only keep the largest element to 1, and set all the others to 0. An example is given below:

original cur_acc: [0.1, 0.2, 0.3, 0.4, 0.5] "normalized" cur_acc: [0, 0, 0, 0, 1]

This modification can make the gradient be invariant with the change of cur_acc. For example, suppose if we multiply all elements in cur_acc by 2. If we use original cur_acc, our gradient will also be multiplied by 2, yet if we apply the "normalization", our gradient will keep unchanged, which can help stable our optimization process.

For more details about this "normalization" approach (called Information-Geometric Optimization, or IGO), check Section 2.2 in [1], where this approach is investigated more thoroughly.

If you still have any other questions, also feel free to ask us.

[1] Ollivier et al. Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles. JMLR 18 (2017)

YinghuaGao commented 4 years ago

Thanks for your quick response, which totally addresses my concern :)

LARS-research / S2E

Questions about the update of theta #1