apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.77k stars 6.79k forks source link

test_cross_device_autograd fails #8495

Open John-Boik opened 7 years ago

John-Boik commented 7 years ago

A snippet of code to test auto-differentiation across devices fails on my machine and I don't understand why. The code is copied from the test_cross_device_autograd function in test_operator_gpu.py.

The following fails:

from mxnet import nd, autograd
x = nd.random_uniform(shape=(10,))
x.attach_grad()
with autograd.record():        
    y = nd.tanh(x)
    y = y.copyto(mx.gpu(0))
    #y.attach_grad()  # I added this line
    y.backward()

The error message is: src/ndarray/autograd.cc:237: Check failed: !i.entry_.is_none() Cannot differentiate node because it is not in a computational graph. You need to set is_recording to true or use autograd.record() to save computational graphs for backward. If you want to differentiate the same graph twice, you need to pass retain_graph=True to backward. However, if I uncomment the y.attach_grad() line, then there is no error. I'm using mxnet v.11.0, python3, and ububtu 16.04. Can anyone explain why an extra call to attach_grad() is required, and is this expected behavior now (is the test code outdated)?

John-Boik commented 7 years ago

I wonder if this unexpected behavior is related to a question/problem I recently posed on stackoverflow about getting a simple example of model parallelism using a Gluon block to work.

szha commented 7 years ago

The y.attach_grad() shouldn't be there. https://github.com/apache/incubator-mxnet/blob/master/tests/python/gpu/test_operator_gpu.py#L1425

John-Boik commented 7 years ago

Yes, that is the point of my post. I had to add the y.attach_grad() in order to get the script to work. If I do not use y.attach_grad(), that is, if I use the original test code, I get the error.

szha commented 6 years ago

@apache/mxnet-committers: This issue has been inactive for the past 90 days. It has no label and needs triage.

For general "how-to" questions, our user forum (and Chinese version) is a good place to get help.