dragen1860 / MAML-Pytorch

Elegant PyTorch implementation of paper Model-Agnostic Meta-Learning (MAML)
MIT License
2.31k stars 420 forks source link

Some questions that hoped to be answered #10

Closed flexibility2 closed 6 years ago

flexibility2 commented 6 years ago

Hi, it's pleasure for me to read your code. However, I have some doubts, and I would be very grateful if you could reply the answers to me.

1、For a task i, why repeat training K times, and doesn't that lead to overfitting? (What I saw in the original paper was “Sample K datapoints ……” ( in the Algorithm 2 ,line 5 ),rather than repeat the same sample K time.)

2、In the Batchnorm layer, why do you move the "mean" and "variance" parameters?

dragen1860 commented 6 years ago

hi,

  1. not repeating training K times. I dnt know why u state that .
  2. what u mean by "move mean and variance". Maybe you should make your doubts clear.
flexibility2 commented 6 years ago

抱歉学长,我还是用中文阐述吧(^_^) 1、https://github.com/dragen1860/MAML-Pytorch/blob/master/maml.py#L374 这个循环 我觉得是将 task[i] 迭代更新了 k 次( 从 0 到 k-1),不太明白为何要这样做?

2、https://github.com/dragen1860/MAML-Pytorch/blob/master/maml.py#L83 这里,将参数“bn_mean”、“bn_vear”的梯度设为不求导,并且调用的时候每次都为 empty ,那相当于 在Batchnorm里这两个参数不起作用吧?不太明白为何要移除这两个参数。

希望能得到学长的回复,本人在读研一,导师给的研究方向是这个“few shot learning”,感觉您在这方面做了好多工作,学长有可以分享的经验吗?因为实验室无人做这个,感觉自己在瞎学…… 谢谢学长~

dragen1860 commented 6 years ago
  1. according to the paper, one gradient step or several gradient step are both supported. Hence K here means the theta_prime parameters will be updated K times on individual task. In the actual code from Chelsa Finn, she chose K=5 in meta-trian and K=10 in meta-test. Its different from parameter k_shot.
  2. Not every time calling will empty bn_mean & bn_variance. It will be emptyed only when creating the model. So if you resume training from last time, you are supposed to load checkpoint from file which saved whole network weights including bn_mean&bn_variance. In that case, bn_mean and bn_variance will keep saved and work as normal.
flexibility2 commented 6 years ago

However, I wonder if "bn_mean & bn_variance" 's " requires_grad=False ", then in the process of back propagation, their grads can't be computed, so how can the parameters of "bn_mean & bn_variance" be updated without grads? https://github.com/dragen1860/MAML-Pytorch/blob/master/maml.py#L83

dragen1860 commented 6 years ago

hi, it's will be updated on the statistics of batching data but not depending on backprop. Please refer to some batch norm tutorial to understand this.

flexibility2 commented 6 years ago

ok, I get it. Thanks a lot! Just one more question hoped to get answers: in the line 399: https://github.com/dragen1860/MAML-Pytorch/blob/master/maml.py#L399 you say "this is a potential problems", could you explain this potential bug specifically?

dragen1860 commented 6 years ago

pls refer to this: https://github.com/dragen1860/MAML-Pytorch/issues/6

flexibility2 commented 6 years ago

Thank you again. you know, I have no intention of offending, but I also think it is not correct if you update the parameters during meta testing. https://github.com/dragen1860/MAML-Pytorch/blob/master/maml.py#L399

dragen1860 commented 6 years ago

Yes, i know. Please discuss this bug on issues: https://github.com/dragen1860/MAML-Pytorch/issues/6 if you have a feasible solution.

txw1997 commented 4 years ago

我是去年入学的,方向也是fewshot learning。。。请问你有什么进展吗,感觉好难呀