Closed flexibility2 closed 6 years ago
hi,
抱歉学长,我还是用中文阐述吧(^_^) 1、https://github.com/dragen1860/MAML-Pytorch/blob/master/maml.py#L374 这个循环 我觉得是将 task[i] 迭代更新了 k 次( 从 0 到 k-1),不太明白为何要这样做?
2、https://github.com/dragen1860/MAML-Pytorch/blob/master/maml.py#L83 这里,将参数“bn_mean”、“bn_vear”的梯度设为不求导,并且调用的时候每次都为 empty ,那相当于 在Batchnorm里这两个参数不起作用吧?不太明白为何要移除这两个参数。
希望能得到学长的回复,本人在读研一,导师给的研究方向是这个“few shot learning”,感觉您在这方面做了好多工作,学长有可以分享的经验吗?因为实验室无人做这个,感觉自己在瞎学…… 谢谢学长~
empty
bn_mean & bn_variance. It will be emptyed only when creating the model.
So if you resume training from last time, you are supposed to load checkpoint from file which saved whole network weights including bn_mean&bn_variance.
In that case, bn_mean and bn_variance will keep saved and work as normal.However, I wonder if "bn_mean & bn_variance" 's " requires_grad=False ", then in the process of back propagation, their grads can't be computed, so how can the parameters of "bn_mean & bn_variance" be updated without grads? https://github.com/dragen1860/MAML-Pytorch/blob/master/maml.py#L83
hi, it's will be updated on the statistics of batching data but not depending on backprop. Please refer to some batch norm tutorial to understand this.
ok, I get it. Thanks a lot! Just one more question hoped to get answers: in the line 399: https://github.com/dragen1860/MAML-Pytorch/blob/master/maml.py#L399 you say "this is a potential problems", could you explain this potential bug specifically?
pls refer to this: https://github.com/dragen1860/MAML-Pytorch/issues/6
Thank you again. you know, I have no intention of offending, but I also think it is not correct if you update the parameters during meta testing. https://github.com/dragen1860/MAML-Pytorch/blob/master/maml.py#L399
Yes, i know. Please discuss this bug on issues: https://github.com/dragen1860/MAML-Pytorch/issues/6 if you have a feasible solution.
我是去年入学的,方向也是fewshot learning。。。请问你有什么进展吗,感觉好难呀
Hi, it's pleasure for me to read your code. However, I have some doubts, and I would be very grateful if you could reply the answers to me.
1、For a task i, why repeat training K times, and doesn't that lead to overfitting? (What I saw in the original paper was “Sample K datapoints ……” ( in the Algorithm 2 ,line 5 ),rather than repeat the same sample K time.)
2、In the Batchnorm layer, why do you move the "mean" and "variance" parameters?