LiJunnan1992 / MLNT

Meta-Learning based Noise-Tolerant Training
123 stars 29 forks source link

能告诉一下您用的是哪个pytorch版本吗? #9

Open Wongcheukwai opened 4 years ago

Wongcheukwai commented 4 years ago

我试过了0.3.1, 0.4.1和最新版本,都出现不同程度的错误(都发生在main.py),谢谢

LiJunnan1992 commented 4 years ago

Can I know which line of code is causing issues? You should make "grad" and "p_tch" leaf tensors that do not require gradient. consistent_loss.backward() should only compute gradient for net.parameters().

Wongcheukwai commented 4 years ago

baseline.py is fine, i got the checkpoint and then ran main.py (where the bug is). 1.first, if i ran your initial code without changing anything (pytorch 0.4.1), the error occurs on line 113 grad.detach_(). It says RuntimeError: Can't detach views in-place. Use detach() instead.

2.I tried to fix the problem by using grad.detach() as told and deleted line 114(to avoid the error since gra.detach() ensures grad.requires_grad = False already). It says RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

3.Then i modified consistent_loss.backward(retain_graph=True) according to chiragbajaj25 in the previous isse, but it triggered GPU memory problems.

I readlly love ur work and code but i have been stucked here for a week. Can you please help please? thank you

LiJunnan1992 commented 4 years ago

it should be grad = grad.detach(). I don't think you need to specify retain_graph=True for consistent_loss.backward(). consistent_loss is independently calculated for each logp_fast, and the graph should be destroyed after backward().

Wongcheukwai commented 4 years ago

I tried grad = grad.detach(),same error still popped out (RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed)

LiJunnan1992 commented 4 years ago

can you try to pinpoint which graph is backward for a second time? have you detached p_tch as well?