Why redefining the default pytorch nn.Modules?

danieltan07 / learning-to-reweight-examples

PyTorch Implementation of the paper Learning to Reweight Examples for Robust Deep Learning

351 stars 61 forks source link

Why redefining the default pytorch nn.Modules? #2

Open ChengYuHsieh opened 6 years ago

ChengYuHsieh commented 6 years ago

Hi, thank you for providing a clean implementation of the paper. I am wondering if there's any reason to redefine the pytorch modules in model.py. I've seen several similar implementations on related topics like MAML, but still get quite confused. Is there any comments on it?

Thanks a lot!

danieltan07 commented 6 years ago

Hi @ChengYuHsieh , sorry for the late reply, I did not see this until now. The reason is because we need to compute for the gradients after the parameter updates and the original implementations of the layers doesn't really allow you to do that in a straightforward way. (I may be wrong but i think its because the original update will do something like this param.data = param.data - learning_rate * gradient, so its not going to compute the gradients after the update) . I found it easier to just define the weights as variables instead.

zhiqihuang commented 4 years ago

Hi, thanks for the explanation.

But, is there a way to bypass this problem? to incorporate the default nn.Modules?

I think the problem is here grads = torch.autograd.grad(l_f_meta, (meta_net.params()), create_graph=True) If use the default nn.Modules, we have to call meta_net.parameters() or meta_net.name_parameters(), after computing the grads, the DCG is detached? causing the RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.