ShusenTang / Dive-into-DL-PyTorch

本项目将《动手学深度学习》(Dive into Deep Learning)原书中的MXNet实现改为PyTorch实现。
http://tangshusen.me/Dive-into-DL-PyTorch
Apache License 2.0
18.17k stars 5.38k forks source link

3.6_softmax-regression-scratch 中运行train_ch3函数,显示叶子节点被移动到图内部 #119

Open leungzzz opened 4 years ago

leungzzz commented 4 years ago

运行位置如下

num_epochs, lr = 5, 0.1
def train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size,
              params=None, lr=None, optimizer=None):
    for epoch in range(num_epochs):
        train_l_sum, train_acc_sum, n = 0.0, 0.0, 0
        for X, y in train_iter:
            y_hat = net(X)
            l = loss(y_hat, y).sum()

            # 梯度清零
            if optimizer is not None:
                optimizer.zero_grad()
            elif params is not None and params[0].grad is not None:
                for param in params:
                    param.grad.data.zero_()

            l.backward()
            if optimizer is None:
                d2l.sgd(params, lr, batch_size)
            else:
                optimizer.step()  # “softmax回归的简洁实现”一节将用到

            train_l_sum += l.item()
            train_acc_sum += (y_hat.argmax(dim=1) == y).sum().item()
            n += y.shape[0]
        test_acc = evaluate_accuracy(test_iter, net)
        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f'
              % (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc))

train_ch3(net, train_iter, test_iter, cross_entropy, num_epochs, batch_size, [W, b], lr)

具体出现如下问题:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-16-066faa4268ff> in <module>()
     31               % (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc))
     32 
---> 33 train_ch3(net, train_iter, test_iter, cross_entropy, num_epochs, batch_size, [W, b], lr)

<ipython-input-16-066faa4268ff> in train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, params, lr, optimizer)
     17 #                     param.grad.data.zero_()
     18 
---> 19             l.backward()
     20             if optimizer is None:
     21                 d2l.sgd(params, lr, batch_size)

/home/tpg/anaconda3/lib/python3.5/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
    164                 products. Defaults to ``False``.
    165         """
--> 166         torch.autograd.backward(self, gradient, retain_graph, create_graph)
    167 
    168     def register_hook(self, hook):

/home/tpg/anaconda3/lib/python3.5/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
     97     Variable._execution_engine.run_backward(
     98         tensors, grad_tensors, retain_graph, create_graph,
---> 99         allow_unreachable=True)  # allow_unreachable flag
    100 
    101 

RuntimeError: leaf variable has been moved into the graph interior

版本信息 python: 3.5 pytorch: 1.3.1 torchvision: 0.4.2

该函数前半部分的代码先自己手打了一遍,后出现该问题,于是直接将该文件下载下来,打开jupyter notebook运行,还是出现这个问题。

按照报错的说法, 是叶子节点W, b被更改过了,尝试在代码片段中加入

print(params[0].is_leaf, params[1].is_leaf)

发现在第一次epoch时输出True, True. 但在第二次Epoch后两者都变为False,可见应该被更改过(变成非叶子节点),但无法排除具体在哪里发生错误。