PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.17k stars 5.57k forks source link

fluid mpi训练好的模型 离线预测时 报错Input Y(0)is not initialized #18716

Closed maosengshulei closed 4 years ago

maosengshulei commented 5 years ago

为使您的问题得到快速解决,在建立Issue前,请您先通过如下方式搜索是否有相似问题:【搜索issue关键字】【使用labels筛选】【官方文档】

如果您没有查询到相似问题,为快速解决您的提问,建立issue时请提供如下细节信息:

Thank you for contributing to PaddlePaddle. Before submitting the issue, you could search issue in the github in case that th If there is no solution,please make sure that this is an inference issue including the following details : System information -PaddlePaddle version (eg.1.1)or CommitID -CPU: including CPUMKL/OpenBlas/MKLDNN version -GPU: including CUDA/CUDNN version -OS Platform (eg.Mac OS 10.14) -Python version -Cmake orders -C++version.txt -API information To Reproduce Steps to reproduce the behavior Describe your current behavior Code to reproduce the issue Other info / logs

预测报错: Traceback (most recent call last): File "dcn_infer.py", line 117, in <module> infer() File "dcn_infer.py", line 107, in infer fetch_list=[v.name for v in fetch_list]) File "/home/work/tools/paddle/paddle_release_home/python/lib/python2.7/site-packages/paddle/fluid/executor.py", line 565, in run use_program_cache=use_program_cache) File "/home/work/tools/paddle/paddle_release_home/python/lib/python2.7/site-packages/paddle/fluid/executor.py", line 642, in _run exe.run(program.desc, scope, 0, True, True, fetch_var_name) paddle.fluid.core.EnforceNotMet: Invoke operator mul error. Python Callstacks: File "/home/work/tools/paddle/paddle_release_home/python/lib/python2.7/site-packages/paddle/fluid/framework.py", line 1654, in append_op attrs=kwargs.get("attrs", None)) File "/home/work/tools/paddle/paddle_release_home/python/lib/python2.7/site-packages/paddle/fluid/layer_helper.py", line 43, in append_op return self.main_program.current_block().append_op(*args, **kwargs) File "/home/work/tools/paddle/paddle_release_home/python/lib/python2.7/site-packages/paddle/fluid/layers/nn.py", line 326, in fc "y_num_col_dims": 1}) File "/home/work/shulei/msd_dcn_v1/fluid_dcn.py", line 112, in DCN ctr_predict = fluid.layers.fc(input=ctr_fc,size=2, act="softmax", param_attr=fluid.ParamAttr(initializer=fluid.initializer.Normal(scale=1 / math.sqrt(ctr_fc.shape[1])))) File "dcn_infer.py", line 96, in infer ctr_predict, dura_predict ,words = DCN(args, feat_list.features) File "dcn_infer.py", line 117, in <module> infer() C++ Callstacks: Input Y(0)is not initialized at [/paddle/paddle/fluid/framework/operator.cc:1142]

预测部分代码: `args = parse_args()

place = fluid.CPUPlace()
inference_scope = fluid.core.Scope()

feat_list = dcn_reader.Msd_Feat_Builder(args.feat_list_path, args.bucket_statistic)
dataset = dcn_reader.MsdDataset(feat_list.features, args.data_path, args.mapping_path, args.seccate_map_path)
dataset.prepare_read()
test_reader = paddle.batch(dataset.infer(), batch_size=args.batch_size)

exe = fluid.Executor(place)
ctr_predict, dura_predict ,words = DCN(args, feat_list.features)
#[inference_program, feed_target_names, fetch_targets] = fluid.io.load_inference_model(executor=exe, dirname=args.model_path)
def if_exist(var):
    return os.path.exists(os.path.join(args.model_path, var.name))
#[inference_program, feed_target_names, fetch_targets] = fluid.io.load_vars(executor=exe, dirname=args.model_path, predicate=if_exist)
fluid.io.load_vars(executor=exe, dirname=args.model_path, predicate=if_exist)
feeder = fluid.DataFeeder(feed_list=words, place=place)
fetch_list = [ctr_predict]
for batch_id, data in enumerate(test_reader()):
    print data
    predict = exe.run(feed=feeder.feed(data),
                    fetch_list=[v.name for v in fetch_list])
    for i in range(len(data)):
        print predict[0][i][1]
exe.close()`

模型部分代码:

`sparse_embed_seq = list(starmap(embedding_layer, zip(words[1 : 1 + sparse_input_length], sparse_attr_names, sparse_attr_size)))

concated = fluid.layers.concat(sparse_embed_seq  + words[0:1] , axis=1)

last_deep_layer = deepnet(concated, args)
last_cross_layer = crossnet(concated, args)

top_fc_input = fluid.layers.concat([last_deep_layer, last_cross_layer], axis=1)

top_fc = fc_block(top_fc_input, np.fromstring(args.top_fc_block,dtype=int, sep="-").tolist())

ctr_fc = fluid.layers.fc(input=top_fc,size=128,act='relu',param_attr=fluid.ParamAttr(initializer=fluid.initializer.Normal(scale=1 / math.sqrt(top_fc.shape[1]))))

#duration_fc = fluid.layers.fc(input=top_fc,size=128,act='relu',param_attr=fluid.ParamAttr(initializer=fluid.initializer.Normal(scale=1 / math.sqrt(top_fc.shape[1]))))

ctr_predict = fluid.layers.fc(input=ctr_fc,size=2, act="softmax", param_attr=fluid.ParamAttr(initializer=fluid.initializer.Normal(scale=1 / math.sqrt(ctr_fc.shape[1]))))

dura_predict = fluid.layers.fc(input=ctr_fc,size=1, act=None, param_attr=fluid.ParamAttr(initializer=fluid.initializer.Normal(scale=1 / math.sqrt(ctr_fc.shape[1]))))`
kuke commented 5 years ago

应该是保存inference model的时候保存了某个输入,但预测的时候没有feed,例如说label,请注意检查下

maosengshulei commented 5 years ago

我check了存model的代码 names = [x.name for x in words[:-1]] print(names) fluid.io.save_inference_model( dirname=infer_model_dir, feeded_var_names=names, target_vars=[ctr_predict,dura_predict], executor=executor) 确实多存了一个label 但是离线infer时feedlist中加入了label也还是报相同的错

maosengshulei commented 5 years ago

而且我之前的另外一个model也是相同的保存方法(也是多存了label),但是离线infer的时候feed_list里不带label这一项,却可以正常infer。

kuke commented 5 years ago

需要执行exe.run(fluid.default_startup_program())