AntreasAntoniou / HowToTrainYourMAMLPytorch

The original code for the paper "How to train your MAML" along with a replication of the original "Model Agnostic Meta Learning" (MAML) paper in Pytorch.
https://arxiv.org/abs/1810.09502
Other
773 stars 137 forks source link

About the configs of `MAML` and `MAML++` #29

Closed yang-jin-hai closed 4 years ago

yang-jin-hai commented 4 years ago

I noticed that MAML and MAML++ configs for mini-imagenet differ in the term of "total_epochs_before_pause" (100 for the former and 101 for the latter), I'm wondering what this config means and why they are different. The "total_epochs_before_pause" configs of omniglot seem to be identical.

Also, I also noticed the config "first_order_to_second_order_epoch" is set to be -1, seems to be a conflict to the Derivative-Order Annealing strategy.

Thanks in advance!

yang-jin-hai commented 4 years ago

Also, I noticed that MAML++ configs for mini-ImageNet don't use Cosine Annealing of Meta-Optimizer Learning Rate strategy since the min-learning rate is set the same as the initial learning rate.

AntreasAntoniou commented 4 years ago

"total_epochs_before_pause" is a hyperparameter that controls how often python will exit. I used it mainly because I sometimes had to break my experiments in multiple chunks due to compute limitation on clusters.

The first_order_to_second_order_epoch is overall useful at speeding things up and is used at various experiments. However, in some other cases you might converge faster without it. Hence why I left it like that in some cases.

On Sun, 5 Apr 2020 at 06:46, WannaSeaU notifications@github.com wrote:

I noticed that MAML and MAML++ configs for mini-imagenet differ in the term of "total_epochs_before_pause" (100 for the former and 101 for the latter), I'm wondering what this config means and why they are different. The "total_epochs_before_pause" configs of omniglot seem to be identical.

Also, I also noticed the config "first_order_to_second_order_epoch" is set to be -1, seems to be a conflict to the Derivative-Order Annealing strategy.

Thanks in advance!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/AntreasAntoniou/HowToTrainYourMAMLPytorch/issues/29, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACSK4NXR7Z5XDDRTYQ5KZITRLALL7ANCNFSM4MAEB6YA .

AntreasAntoniou commented 4 years ago

That's because in certain experiments you can converge faster without the cosine annealer. I provided the exact configs to reproduce the results I have in the paper.

On Mon, 6 Apr 2020 at 16:00, WannaSeaU notifications@github.com wrote:

Also, I noticed that MAML++ configs for mini-ImageNet don't use Cosine Annealing of Meta-Optimizer Learning Rate strategy since the min-learning rate is set the same as the initial learning rate.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/AntreasAntoniou/HowToTrainYourMAMLPytorch/issues/29#issuecomment-609848093, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACSK4NW2YCSEBQZCWYKBU5LRLHVAPANCNFSM4MAEB6YA .

yang-jin-hai commented 4 years ago

Thanks! I've read all the codes, and it benefits me a lot! I can reproduce the results in the paper with the configs you provided. Awesome work!