Dear Optax team,
I am working on implementing Model-Agnostic Meta-Learning in my project, and I noticed that setting the inner loop optimizer to the default Adam optimizer in Optax results in nan values in the meta-gradients. This is covered well in the documentation, which mentions that the eps_root should be set to a small constant to avoid dividing by zero when rescaling. Could you please recommend a good default value for eps_root in a meta-learning scenario?
Dear Optax team, I am working on implementing Model-Agnostic Meta-Learning in my project, and I noticed that setting the inner loop optimizer to the default Adam optimizer in Optax results in nan values in the meta-gradients. This is covered well in the documentation, which mentions that the
eps_root
should be set to a small constant to avoid dividing by zero when rescaling. Could you please recommend a good default value foreps_root
in a meta-learning scenario?