RuntimeError during training: modifying a view in place

coli-saar / am-parser

Modular implementation of an AM dependency parser in AllenNLP.

Apache License 2.0

30 stars 10 forks source link

RuntimeError during training: modifying a view in place #100

Open megodoonch opened 1 year ago

megodoonch commented 1 year ago

I get a Runtime error when training on the toy corpus in example/:

File: graph_dependency_parser/components/cle.py, line 85, in cle_loss

RuntimeError: Output 0 of SliceBackward0 is a view and is being modified inplace. This view is the output of a function that returns multiple views. Such functions do not allow the output views to be modified inplace. You should replace the inplace operation by an out-of-place one.

m[range, g] -= 1.0 # cost augmentation

If I comment this line out, training goes through. I'm using the version of dependencies in @tsimafeip's Docker fork. The current main branch has the same line of code, though.

It kind of looks like it's not crucial; anyone know if I can just leave it out for now?

Here's my training log: training-error.log

megodoonch commented 1 year ago

My student just reported to me that the version he downloaded a few weeks ago didn't have this problem (but did have this line of code), so I'm thinking this is a bug for the fork instead?

jgroschwitz commented 1 year ago

I'm not sure if you can leave it out. However, if that line is the in-place operation that the error complains about, then writing

m[range, g] = m[range, g] - 1.0

should do the trick, see here.

I don't know why this fails in the pull request and not in the earlier experiment of your student. When I get around to reviewing the pull request (hopefully this week), I'll look into it.

megodoonch commented 1 year ago

Seems all I have to do is make a bug report and I solve the problem myself. All it takes is enough Googling...

The Dockerfile doesn't specify the version of Pytorch, so it uses the latest. The instructions say to use 1.1, so I modified the Dockerfile and now it works.

I didn't realise your solution wasn't still modifying in place, so I didn't try it.

Is there any reason not to use the highest version of everything that doesn't throw errors? Should I stick to the versions in the instructions?

megodoonch commented 1 year ago

Note: changing to m[range, g] = m[range, g] - 1.0 isn't enough (same error), but using an older version of Pytorch is, probably anything earlier than 1.6.

megodoonch commented 1 year ago

So this seems to be a similar issue to this one: https://stackoverflow.com/questions/67768535/getting-this-warning-output-0-of-backwardhookfunctionbackward-is-a-view-and-is

Back then it was just a UserWarning, and now it's a RuntimeError.

UserWarning: Output 0 of BackwardHookFunctionBackward is a view and is being modified inplace. This view was created inside a custom Function (or because an input was returned as-is) and the autograd logic to handle view+inplace would override the custom backward associated with the custom Function, leading to incorrect gradients. This behavior is deprecated and will be forbidden starting version 1.6. You can remove this warning by cloning the output of the custom Function.

I reopenned and re-tagged this issue because the warning does say that it leads to incorrect gradients. So this line of code might not be doing what it intended.