optimizer selection - Githubissues

Thanks for sharing the code!

I have some problem about how to select optimizer when training diffmask. I find that Lookahead RMSprop is used in 'How do Decisions Emerge across Layers in Neural Models? Interpretation with Differentiable Masking'. But in this work, RMSProp is chosed. Why you change the optimizer? Does the choice of optimizer affect the result a lot?

It will help me a lot if I can get some advice！