I noticed that the issue Input(Out@GRAD) shouldn't be null错误怎么排查 #16528 is similar. But it seems somewhat different as well.
具体说来,我这边定义了一个generator:
def generator(parent, program, x_real, temperature, temperature_pretrain, w_dict, vocab_size, batch_size, seq_len, gen_emb_dim, mem_slots, head_size, num_heads,
hidden_dim, start_token):
IS_SPARSE = True
grad_clip = 5.0 # keep the same with the previous setting
pretrain_opt = fluid.optimizer.AdamOptimizer(gpre_lr, beta1=0.9, beta2=0.999)
fluid.clip.set_gradient_clip(
clip=fluid.clip.GradientClipByGlobalNorm(
clip_norm=grad_clip))
pretrain_opt.minimize(g_pretrain_loss)
然后mimize(g_pretrain_loss)就报错如下:
File "/usr/local/lib/python3.6/dist-packages/paddle/fluid/backward.py", line 706, in append_backward
_append_backwardvars(root_block, fwd_op_num, grad_to_var, grad_info_map)
File "/usr/local/lib/python3.6/dist-packages/paddle/fluid/backward.py", line 518, in _append_backwardvars
op_desc.infer_shape(block.desc)
paddle.fluid.core_avx.EnforceNotMet: Invoke operator unsqueeze2_grad error.
Python Call stacks:
File "/usr/local/lib/python3.6/dist-packages/paddle/fluid/framework.py", line 1814, in append_op
attrs=kwargs.get("attrs", None))
File "/usr/local/lib/python3.6/dist-packages/paddle/fluid/layer_helper.py", line 43, in append_op
return self.main_program.current_block().append_op(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/paddle/fluid/layers/nn.py", line 7273, in unsqueeze
"XShape": x_shape})
File "/media/data2/dingcheng/workspace/baidu/ccl/MetaCotRelGAN/models/rmc_vanilla_dict.py", line 106, in generator
x_real = fluid.layers.unsqueeze(x_real, [1])
File "/media/data2/dingcheng/workspace/baidu/ccl/MetaCotRelGAN/real/real_gan/real_train_dict.py", line 92, in real_train_dict
start_token=config['start_token'])
File "run_dict.py", line 144, in main
real_train_dict(main_program, generator, generator_meta, mediator, discriminator, oracle_loader, config)
File "run_dict.py", line 151, in
main()
C++ Call stacks:
Input(Out@GRAD) shouldn't be null. at [/paddle/paddle/fluid/operators/unsqueeze_op.cc:246]
I noticed that the issue Input(Out@GRAD) shouldn't be null错误怎么排查 #16528 is similar. But it seems somewhat different as well. 具体说来,我这边定义了一个generator: def generator(parent, program, x_real, temperature, temperature_pretrain, w_dict, vocab_size, batch_size, seq_len, gen_emb_dim, mem_slots, head_size, num_heads, hidden_dim, start_token): IS_SPARSE = True
parent = 'generator'
针对pretrain_loss, 我接着定义了如下优化方法:
generator pre-training
然后mimize(g_pretrain_loss)就报错如下:
File "/usr/local/lib/python3.6/dist-packages/paddle/fluid/backward.py", line 706, in append_backward _append_backwardvars(root_block, fwd_op_num, grad_to_var, grad_info_map) File "/usr/local/lib/python3.6/dist-packages/paddle/fluid/backward.py", line 518, in _append_backwardvars op_desc.infer_shape(block.desc) paddle.fluid.core_avx.EnforceNotMet: Invoke operator unsqueeze2_grad error. Python Call stacks: File "/usr/local/lib/python3.6/dist-packages/paddle/fluid/framework.py", line 1814, in append_op attrs=kwargs.get("attrs", None)) File "/usr/local/lib/python3.6/dist-packages/paddle/fluid/layer_helper.py", line 43, in append_op return self.main_program.current_block().append_op(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/paddle/fluid/layers/nn.py", line 7273, in unsqueeze "XShape": x_shape}) File "/media/data2/dingcheng/workspace/baidu/ccl/MetaCotRelGAN/models/rmc_vanilla_dict.py", line 106, in generator x_real = fluid.layers.unsqueeze(x_real, [1]) File "/media/data2/dingcheng/workspace/baidu/ccl/MetaCotRelGAN/real/real_gan/real_train_dict.py", line 92, in real_train_dict start_token=config['start_token']) File "run_dict.py", line 144, in main real_train_dict(main_program, generator, generator_meta, mediator, discriminator, oracle_loader, config) File "run_dict.py", line 151, in
main()
C++ Call stacks:
Input(Out@GRAD) shouldn't be null. at [/paddle/paddle/fluid/operators/unsqueeze_op.cc:246]
根据上次#16528中所述,他们发现: 训练代码中用到了softmax_with_cross_entropy和sigmoid_cross_entropy_with_logits,这两个op针对label不计算梯度,这点与tf对应接口实现不同。 但是我这里感觉不是同样的原因。能帮忙看看吗?