PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.26k stars 5.59k forks source link

2.0beta版本,BatchNorm1d 报错NotFoundError: No Output(X@GRAD) found for BatchNormGrad operator. #28191

Closed BlackHorseQ closed 4 years ago

BlackHorseQ commented 4 years ago

百度aistudio中运行如下代码,在未加batchnorm1d是可以运行的但是加上提示没有梯度

class base_model(fluid.dygraph.Layer): def init(self, classes_num: int): super().init() self.hidden_size = 128 self.batchNorm1d = paddle.nn.BatchNorm1d(5) self.lstm = paddle.nn.LSTM(input_size=20, hidden_size=self.hidden_size, direction="bidirectional")

    self.avgpool1d = paddle.nn.AvgPool1d(kernel_size=self.hidden_size*2, stride=self.hidden_size*2)
    self.maxpool1d = paddle.nn.MaxPool1d(kernel_size=self.hidden_size*2, stride=self.hidden_size*2)
def forward(self, input):
    #input:(batch_size, max_len, dim)

    x = self.batchNorm1d(input)
    # x = input
    rnn_out = self.lstm(x)[0]
    mean_out = self.avgpool1d(x)
    max_out = self.maxpool1d(x)
    r_shape = (mean_out.shape[0], mean_out.shape[1])
    mean_pool_out = layers.reshape(mean_out, shape=r_shape)
    max_pool_out = layers.reshape(max_out, shape=r_shape)
    add_output = mean_pool_out + max_pool_out
    concat_output = layers.concat((mean_pool_out, max_pool_out), axis=1)

    output = layers.fc(concat_output, size=4)
    return output

if name == 'main':

创建模型

# with fluid.dygraph.guard():
program = fluid.default_main_program()
program.random_seed = 2020
model = base_model(4)
print('start training ... {} kind'.format(4))
model.train()
epoch_num = 30
# 定义优化器
opt = fluid.optimizer.Adam(learning_rate=0.001, parameter_list=model.parameters())
# 定义数据读取器,训练数据读取器和验证数据读取器
x = joblib.load('train/preprocess_file/20190701_x.pkl')
y = joblib.load('train/preprocess_file/20190701_y.pkl')
train_loader = data_loader(x, y, 1024)
valid_loader = data_loader(x, y, 1024)

best_acc = 0
valid_acc = 0

print('start training ... {} kind'.format(4))
for epoch in range(epoch_num):
    all_loss = 0
    model.train()

    for batch_id, data in enumerate(train_loader()):
        x_data, y_data = data
        x = paddle.to_tensor(x_data)
        label = paddle.to_tensor(y_data)
        label = paddle.fluid.one_hot(label, depth=4)
        # 运行模型前向计算,得到预测值
        logits = model(x)
        # 进行loss计算
        softmax_logits = fluid.layers.softmax(logits)
        loss = fluid.layers.cross_entropy(softmax_logits, label, soft_label=True)
        avg_loss = fluid.layers.mean(loss)
        all_loss += avg_loss.numpy()
        avg_l = all_loss/(batch_id + 1)
        # pbar.set_description("epoch: {}, batch_id: {}, loss is: {}, avg loss is: {}, valid acc is: {}".format(epoch, batch_id, avg_loss.numpy(), avg_l, 
        # valid_acc))
        if(batch_id % 100 == 0):
            print("epoch: {}, batch_id: {}, loss is: {}, avg loss is: {}, valid acc is: {}".format(epoch, batch_id, avg_loss.numpy(), avg_l, valid_acc))
        avg_loss.backward()
        opt.minimize(avg_loss)
        model.clear_gradients()
        # break
    model.eval()
    accuracies = []
    losses = []
    for batch_id, data in enumerate(valid_loader()):
        x_data, y_data = data
        img = fluid.dygraph.to_variable(x_data)
        label = fluid.dygraph.to_variable(y_data)
        # 运行模型前向计算,得到预测值
        logits = model(img)
        # 计算sigmoid后的预测概率,进行loss计算
        pred = fluid.layers.softmax(logits)
        acc = fluid.layers.accuracy(pred, fluid.layers.reshape(label, [-1, 1]))
        accuracies.append(acc.numpy())
    valid_acc = np.mean(accuracies)
    if valid_acc > best_acc and epoch >= 2:
        if valid_acc < 0.98:
            continue
        best_acc = np.mean(accuracies)
        # # save params of model
        fluid.save_dygraph(model.state_dict(), './params/{}fold_{}epoch_{:.3f}'.format(fold, epoch,best_acc))
        # save optimizer state
        fluid.save_dygraph(opt.state_dict(), './params/{}fold_{}epoch_{:.3f}'.format(fold, epoch, best_acc))

0 paddle::imperative::BasicEngine::Execute() 1 paddle::imperative::OpBase::Run(paddle::framework::OperatorBase const&, paddle::imperative::NameVariableWrapperMap const&, paddle::imperative::NameVariableWrapperMap const&, paddle::framework::AttributeMap const&, paddle::platform::Place const&) 2 paddle::imperative::PreparedOp::Run(paddle::imperative::NameVariableWrapperMap const&, paddle::imperative::NameVariableWrapperMap const&, paddle::framework::AttributeMap const&) 3 paddle::operators::BatchNormGradOp::InferShape(paddle::framework::InferShapeContext) const 4 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const, int) 5 paddle::platform::GetCurrentTraceBackString()


Error Message Summary:

NotFoundError: No Output(X@GRAD) found for BatchNormGrad operator. [Hint: Expected ctx->HasOutput(framework::GradVarName("X")) == true, but received ctx->HasOutput(framework::GradVarName("X")):0 != true:1.] (at /paddle/paddle/fluid/operators/batch_norm_op.cc:466)

BlackHorseQ commented 4 years ago

问题解决,因为优化时候使用的是parameter.list(),而normalize默认不可导。 x.stop_gradient = True。 这个其实是开发中需要考虑的,是不是应该默认可导。

gfwm2013 commented 4 years ago

@BlackHorseQ 感谢建议,您的需求我会向内部进行反馈的。

Guangjun-A commented 3 years ago

确实是梯度的问题....