MichSchli / GraphMask

Official implementation of GraphMask as presented in our paper Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking.
MIT License
39 stars 10 forks source link

optimizer selection #3

Open Tan-Hexiang opened 1 year ago

Tan-Hexiang commented 1 year ago

Thanks for sharing the code!

I have some problem about how to select optimizer when training diffmask. I find that Lookahead RMSprop is used in 'How do Decisions Emerge across Layers in Neural Models? Interpretation with Differentiable Masking'. But in this work, RMSProp is chosed. Why you change the optimizer? Does the choice of optimizer affect the result a lot?

It will help me a lot if I can get some advice!