PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.29k stars 5.61k forks source link

fluid.io.save_inference_model 报错 ValueError: var read_file_0.tmp_3 not in this block #27001

Closed yellmi closed 4 years ago

yellmi commented 4 years ago

为使您的问题得到快速解决,在建立Issues前,请您先通过如下方式搜索是否有相似问题:【搜索issue关键字】【使用labels筛选】【官方文档】

   1)PaddlePaddle版本:1.8.2 , paddle 1.7版本不会报这个错    3)GPU:P40 cuda 9 cudnn7.3    4)系统环境:python 3.7

juncaipeng commented 4 years ago

请补充完整代码,谢谢。

yellmi commented 4 years ago

已补充

eshaoliu commented 4 years ago

已补充 感谢帮忙补充,不过我本地1.6版本以上均报错,只有1.5版本不报错,可能是由于裁剪的原因。

eshaoliu commented 4 years ago

试着将冗余的输入删除, feed_targets_name.remove('read_file_0.tmp_3') feed_targets_name.remove('read_file_0.tmp_8') feed_targets_name.remove('read_file_0.tmp_12')

fluid.io.save_inference_model(
    model_path,
    feed_targets_name,
    [left_score, type_probs], 
    #[left_score, right_score, type_probs],
    exe,
    main_program=predict_prog,
)    

之前是这样的 inputs = [array2tensor(ndarray) for ndarray in [ src_ids_1, sent_ids_1, pos_ids_1, task_ids_1, input_mask_1, src_ids_2, sent_ids_2, pos_ids_2, task_ids_2, input_mask_2, 改之后是这样 inputs = [array2tensor(ndarray) for ndarray in [ src_ids_1, sent_ids_1, pos_ids_1, input_mask_1, src_ids_2, sent_ids_2, pos_ids_2, input_mask_2, 但是报下面的错误
2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4875287335277170688. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4875287335277170688. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4875287335277170688. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4875287335277170688. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4875287335277170688. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4875287335277170688. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4875287335277170688. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4875287335277170688. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4875287335277170688. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4875287335277170688. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4875287335277170688. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4875287335277170688. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4875287335277170688. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4875287335277170688. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4875287335277170688. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4875287335277170688. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4875287335277170688. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4875287335277170688. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4875287335277170688. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4875287335277170688. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4875287335277170688. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4875287335277170688. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4875287335277170688. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4875287335277170688. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4875287335277170688. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4875287335277170688. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4875287335277170688. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4875287335277170688. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4875287335277170688. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4875287335277170688. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4875287335277170688. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4875287335277170688. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4715268811004936192. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4715268811004936192. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4715268811004936192. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4715268811004936192. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4715268811004936192. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4715268811004936192. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4715268811004936192. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4715268811004936192. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4715268811004936192. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4715268811004936192. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4715268811004936192. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4715268811004936192. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4715268811004936192. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4715268811004936192. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4715268811004936192. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4715268811004936192. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4715268811004936192. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4715268811004936192. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4715268811004936192. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4715268811004936192. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4715268811004936192. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4715268811004936192. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4715268811004936192. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4715268811004936192. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4715268811004936192. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4715268811004936192. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4715268811004936192. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4715268811004936192. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4715268811004936192. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4715268811004936192. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4715268811004936192. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4715268811004936192. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4771563806325014528. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4771563806325014528. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4771563806325014528. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4771563806325014528. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4771563806325014528. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4771563806325014528. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4771563806325014528. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4771563806325014528. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4771563806325014528. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4771563806325014528. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4771563806325014528. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4771563806325014528. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4771563806325014528. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4771563806325014528. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4771563806325014528. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4771563806325014528. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4771563806325014528. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4771563806325014528. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4771563806325014528. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4771563806325014528. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4771563806325014528. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4771563806325014528. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4771563806325014528. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4771563806325014528. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4771563806325014528. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4771563806325014528. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4771563806325014528. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4771563806325014528. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4771563806325014528. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4771563806325014528. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4771563806325014528. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4771563806325014528. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4833769776202604544. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4833769776202604544. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4833769776202604544. Please check input value. 2020-09-04 12:29:20 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4833769776202604544. Please check input value. 但数据并没有越界。

eshaoliu commented 4 years ago

2020-09-04 13:15:07 2020-09-04 13:15:07 -------------------------------------------- 2020-09-04 13:15:07 C++ Call Stacks (More useful to developers): 2020-09-04 13:15:07 -------------------------------------------- 2020-09-04 13:15:07 0 std::string paddle::platform::GetTraceBackString<char const>(char const&&, char const, int) 2020-09-04 13:15:07 1 paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const, int) 2020-09-04 13:15:07 2 void paddle::operators::math::Blaspaddle::platform::CUDADeviceContext::GEMM(CBLAS_TRANSPOSE, CBLAS_TRANSPOSE, int, int, int, float, float const, float const, float, float) const 2020-09-04 13:15:07 3 void paddle::operators::math::Blaspaddle::platform::CUDADeviceContext::MatMul(paddle::framework::Tensor const&, bool, paddle::framework::Tensor const&, bool, float, paddle::framework::Tensor, float) const 2020-09-04 13:15:07 4 paddle::operators::MulKernel<paddle::platform::CUDADeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const 2020-09-04 13:15:07 5 std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::MulKernel<paddle::platform::CUDADeviceContext, float>, paddle::operators::MulKernel<paddle::platform::CUDADeviceContext, double>, paddle::operators::MulKernel<paddle::platform::CUDADeviceContext, paddle::platform::float16> >::operator()(char const, char const, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&) 2020-09-04 13:15:07 6 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext) const 2020-09-04 13:15:07 7 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const 2020-09-04 13:15:07 8 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&) 2020-09-04 13:15:07 9 paddle::framework::NaiveExecutor::Run() 2020-09-04 13:15:07 10 paddle::AnalysisPredictor::Run(std::vector<paddle::PaddleTensor, std::allocatorpaddle::PaddleTensor > const&, std::vector<paddle::PaddleTensor, std::allocatorpaddle::PaddleTensor >, int) 2020-09-04 13:15:07 2020-09-04 13:15:07 ------------------------------------------ 2020-09-04 13:15:07 Python Call Stacks (More useful to users): 2020-09-04 13:15:07 ------------------------------------------ 2020-09-04 13:15:07 File "/usr/local/anaconda3/lib/python3.6/site-packages/paddle/fluid/framework.py", line 2525, in append_op 2020-09-04 13:15:07 attrs=kwargs.get("attrs", None)) 2020-09-04 13:15:07 File "/usr/local/anaconda3/lib/python3.6/site-packages/paddle/fluid/layer_helper.py", line 43, in append_op 2020-09-04 13:15:07 return self.main_program.current_block().append_op(*args, **kwargs) 2020-09-04 13:15:07 File "/usr/local/anaconda3/lib/python3.6/site-packages/paddle/fluid/layers/nn.py", line 344, in fc 2020-09-04 13:15:07 "y_num_col_dims": 1}) 2020-09-04 13:15:07 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/model/transformer_encoder.py", line 74, in __compute_qkv 2020-09-04 13:15:07 bias_attr=name + '_key_fc.b_0', 2020-09-04 13:15:07 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/model/transformer_encoder.py", line 144, in multi_head_attention 2020-09-04 13:15:07 q, k, v = __compute_qkv(queries, keys, values, n_head, d_key, d_value) 2020-09-04 13:15:07 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/model/transformer_encoder.py", line 319, in encoder_layer 2020-09-04 13:15:07 name=name + '_multi_head_att', 2020-09-04 13:15:07 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/model/transformer_encoder.py", line 389, in encoder 2020-09-04 13:15:07 name=name + 'layer' + str(i), 2020-09-04 13:15:07 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/model/ernie.py", line 195, in _build_model 2020-09-04 13:15:07 name='encoder', 2020-09-04 13:15:07 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/model/ernie.py", line 108, in init 2020-09-04 13:15:07 input_mask, 2020-09-04 13:15:07 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/finetune/type_pairwise_ranker.py", line 51, in cls_from_ernie 2020-09-04 13:15:07 use_fp16=use_fp16, 2020-09-04 13:15:07 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/finetune/type_pairwise_ranker.py", line 154, in create_model 2020-09-04 13:15:07 use_fp16=args.use_fp16, 2020-09-04 13:15:07 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/infer_type_ranker.py", line 130, in main 2020-09-04 13:15:07 is_prediction=True, 2020-09-04 13:15:07 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/infer_type_ranker.py", line 479, in 2020-09-04 13:15:07 main(args) 2020-09-04 13:15:07 2020-09-04 13:15:07 ---------------------- 2020-09-04 13:15:07 Error Message Summary: 2020-09-04 13:15:07 ---------------------- 2020-09-04 13:15:07 Error: An error occurred here. There is no accurate error hint for this error yet. We are continuously in the process of increasing hint for this kind of error check. It would be helpful if you could inform us of how this conversion went by opening a github issue. And we will resolve it with high priority. 2020-09-04 13:15:07 - New issue link: https://github.com/PaddlePaddle/Paddle/issues/new 2020-09-04 13:15:07 - Recommended issue content: all error stack information 2020-09-04 13:15:07 [Hint: CUBLAS_STATUS_EXECUTION_FAILED] at (/paddle/paddle/fluid/operators/math/blas_impl.cu.h:34) 2020-09-04 13:15:07 [operator < mul > error] 报错还不稳定,有时候报这个错(减少输入数目后的报错) 1.7.1

eshaoliu commented 4 years ago

2020-09-04 15:27:55 W0904 15:27:55.232426 93 init.cc:218] You could check whether you killed PaddlePaddle thread/process accidentally or report the case to PaddlePaddle 2020-09-04 15:27:55 W0904 15:27:55.232430 93 init.cc:221] The detail failure signal is: 2020-09-04 15:27:55 2020-09-04 15:27:55 W0904 15:27:55.232434 93 init.cc:224] Aborted at 1599204475 (unix time) try "date -d @1599204475" if you are using GNU date 2020-09-04 15:27:55 W0904 15:27:55.234194 93 init.cc:224] PC: @ 0x0 (unknown) 2020-09-04 15:27:55 W0904 15:27:55.234395 93 init.cc:224] SIGSEGV (@0x0) received by PID 93 (TID 0x7f80b5ca8740) from PID 0; stack trace: 2020-09-04 15:27:55 W0904 15:27:55.235718 93 init.cc:224] @ 0x7f80b58805d0 (unknown) 2020-09-04 15:27:55 W0904 15:27:55.240444 93 init.cc:224] @ 0x7f7fa2d2203b paddle::framework::Variable::GetMutable<>() 2020-09-04 15:27:55 W0904 15:27:55.241524 93 init.cc:224] @ 0x7f7fa61f1b7e _ZZN6paddle9framework2ir8patternsL13BuildFusionV2EPNS1_5GraphERKSsPNS0_5ScopeEENKUlRKSt13unordered_mapIPNS1_6PDNodeEPNS1_4NodeESt4hashISB_ESt8equal_toISB_ESaISt4pairIKSB_SD_EEES4_E0_clESOS4.isra.753 2020-09-04 15:27:55 W0904 15:27:55.244629 93 init.cc:224] @ 0x7f7fa6352bd6 paddle::framework::ir::GraphPatternDetector::operator()() 2020-09-04 15:27:55 W0904 15:27:55.247149 93 init.cc:224] @ 0x7f7fa61e067a paddle::framework::ir::MultiHeadMatmulV2FusePass::ApplyImpl() 2020-09-04 15:27:55 W0904 15:27:55.248822 93 init.cc:224] @ 0x7f7fa637f4d2 paddle::framework::ir::Pass::Apply() 2020-09-04 15:27:55 W0904 15:27:55.251083 93 init.cc:224] @ 0x7f7fa6111d69 paddle::inference::analysis::IRPassManager::Apply() 2020-09-04 15:27:55 W0904 15:27:55.253607 93 init.cc:224] @ 0x7f7fa610e7d0 paddle::inference::analysis::IrAnalysisPass::RunImpl() 2020-09-04 15:27:55 W0904 15:27:55.257437 93 init.cc:224] @ 0x7f7fa610a01b paddle::inference::analysis::Analyzer::RunAnalysis() 2020-09-04 15:27:55 W0904 15:27:55.260177 93 init.cc:224] @ 0x7f7fa30e882d paddle::AnalysisPredictor::OptimizeInferenceProgram() 2020-09-04 15:27:55 W0904 15:27:55.263805 93 init.cc:224] @ 0x7f7fa30e91f7 paddle::AnalysisPredictor::PrepareProgram() 2020-09-04 15:27:55 W0904 15:27:55.265928 93 init.cc:224] @ 0x7f7fa30e9397 paddle::AnalysisPredictor::Init() 2020-09-04 15:27:55 W0904 15:27:55.269109 93 init.cc:224] @ 0x7f7fa30e97ca paddle::CreatePaddlePredictor<>() 2020-09-04 15:27:55 W0904 15:27:55.272568 93 init.cc:224] @ 0x7f7fa30ea351 paddle::CreatePaddlePredictor<>() 2020-09-04 15:27:55 W0904 15:27:55.273885 93 init.cc:224] @ 0x7f7fa2fb934d _ZZN8pybind1112cpp_function10initializeIRPFSt10unique_ptrIN6paddle15PaddlePredictorESt14default_deleteIS4_EERKNS3_14AnalysisConfigEES7_ISA_EINS_4nameENS_5scopeENS_7siblingEEEEvOT_PFT0_DpT1_EDpRKT2_ENKUlRNS_6detail13function_callEE1clESU 2020-09-04 15:27:55 W0904 15:27:55.275130 93 init.cc:224] @ 0x7f7fa2fb93be _ZZN8pybind1112cpp_function10initializeIRPFSt10unique_ptrIN6paddle15PaddlePredictorESt14default_deleteIS4_EERKNS3_14AnalysisConfigEES7_JSA_EJNS_4nameENS_5scopeENS_7siblingEEEEvOT_PFT0_DpT1_EDpRKT2_ENUlRNS_6detail13function_callEE1_4FUNESU 2020-09-04 15:27:55 W0904 15:27:55.276345 93 init.cc:224] @ 0x7f7fa2d5b529 pybind11::cpp_function::dispatcher() 2020-09-04 15:27:55 W0904 15:27:55.319316 93 init.cc:224] @ 0x55d3bf40cfd4 _PyCFunction_FastCallDict 2020-09-04 15:27:55 W0904 15:27:55.319588 93 init.cc:224] @ 0x55d3bf49abec call_function 2020-09-04 15:27:55 W0904 15:27:55.320116 93 init.cc:224] @ 0x55d3bf4bf19a _PyEval_EvalFrameDefault 2020-09-04 15:27:55 W0904 15:27:55.320359 93 init.cc:224] @ 0x55d3bf4947db fast_function 2020-09-04 15:27:55 W0904 15:27:55.320606 93 init.cc:224] @ 0x55d3bf49acc5 call_function 2020-09-04 15:27:55 W0904 15:27:55.320947 93 init.cc:224] @ 0x55d3bf4bf19a _PyEval_EvalFrameDefault 2020-09-04 15:27:55 W0904 15:27:55.321285 93 init.cc:224] @ 0x55d3bf495529 PyEval_EvalCodeEx 2020-09-04 15:27:55 W0904 15:27:55.321607 93 init.cc:224] @ 0x55d3bf4962cc PyEval_EvalCode 2020-09-04 15:27:55 W0904 15:27:55.321858 93 init.cc:224] @ 0x55d3bf512af4 run_mod 2020-09-04 15:27:55 W0904 15:27:55.322158 93 init.cc:224] @ 0x55d3bf512ef1 PyRun_FileExFlags 2020-09-04 15:27:55 W0904 15:27:55.322491 93 init.cc:224] @ 0x55d3bf5130f4 PyRun_SimpleFileExFlags 2020-09-04 15:27:55 W0904 15:27:55.322788 93 init.cc:224] @ 0x55d3bf516c28 Py_Main 2020-09-04 15:27:55 W0904 15:27:55.323122 93 init.cc:224] @ 0x55d3bf3de71e main 2020-09-04 15:27:55 W0904 15:27:55.340014 93 init.cc:224] @ 0x7f80b54c63d5 __libc_start_main 2020-09-04 15:27:55 W0904 15:27:55.340442 93 init.cc:224] @ 0x55d3bf4c5c98 (unknown) 2020-09-04 15:29:10 sh: line 1: 93 段错误 (core dumped) 1.8.1的报错是这样的(减少输入数目后的报错)

eshaoliu commented 4 years ago

模型定义见这里 def create_model( args, pyreader_name, ernie_config, is_prediction=False, task_name="", is_classify=False, is_regression=False, ernie_version="1.0", ): """create_model""" if is_classify: pyreader = fluid.layers.py_reader( capacity=50, shapes=[ [-1, args.max_seq_len, 1], [-1, args.max_seq_len, 1], [-1, args.max_seq_len, 1], [-1, args.max_seq_len, 1], [-1, args.max_seq_len, 1], [-1, args.max_seq_len, 1], [-1, args.max_seq_len, 1], [-1, args.max_seq_len, 1], [-1, args.max_seq_len, 1], [-1, args.max_seq_len, 1],

[-1, args.max_seq_len, 1], [-1, args.max_seq_len, 1],

[-1, args.max_seq_len, 1], [-1, args.max_seq_len, 1],

[-1, args.max_seq_len, 1],

[-1, 1], [-1, 1], [-1, 1], ], dtypes=[ 'int64', 'int64', 'int64', 'int64', 'float32', 'int64', 'int64', 'int64', 'int64', 'float32',

'int64', 'int64', 'int64', 'int64', 'float32',

'int64', 'int64', 'int64', ], lod_levels=[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], name=taskname + "" + pyreader_name, use_double_buffer=True, )

( src_ids_1, sent_ids_1, pos_ids_1, task_ids_1, input_mask_1, src_ids_2, sent_ids_2, pos_ids_2, task_ids_2, input_mask_2,

src_ids_3, sent_ids_3, pos_ids_3, task_ids_3, input_mask_3,

labels, types, qids,

) = fluid.layers.read_file(pyreader)

cls_feats_query = cls_from_ernie( args, src_ids=src_ids_1, position_ids=pos_ids_1, sentence_ids=sent_ids_1, task_ids=task_ids_1, input_mask=input_mask_1, config=ernie_config, use_fp16=args.use_fp16, )

cls_feats_left = cls_from_ernie( args, src_ids=src_ids_2, position_ids=pos_ids_2, sentence_ids=sent_ids_2, task_ids=task_ids_2, input_mask=input_mask_2, config=ernie_config, use_fp16=args.use_fp16, ) ''' cls_feats_right = cls_from_ernie( args, src_ids=src_ids_3, position_ids=pos_ids_3, sentence_ids=sent_ids_3, task_ids=task_ids_3, input_mask=input_mask_3, config=ernie_config, use_fp16=args.use_fp16, ) ''' left_concat = fluid.layers.concat( input=[cls_feats_query, cls_feats_left], axis=-1, )

right_concat = fluid.layers.concat(

input=[cls_feats_query, cls_feats_right], axis=-1,

)

left_score = fluid.layers.fc( input=left_concat, size=1, param_attr=fluid.ParamAttr( name=task_name + "_cls_out_w_left", initializer=fluid.initializer.TruncatedNormal(scale=0.02), ), bias_attr=fluid.ParamAttr( name=task_name + "_cls_out_b_left", initializer=fluid.initializer.Constant(0.), ), ) left_score = fluid.layers.sigmoid(left_score) ''' right_score = fluid.layers.fc( input=right_concat, size=1, param_attr=fluid.ParamAttr( name=task_name + "_cls_out_w_right", initializer=fluid.initializer.TruncatedNormal(scale=0.02), ), bias_attr=fluid.ParamAttr( name=task_name + "_cls_out_b_right", initializer=fluid.initializer.Constant(0.), ), ) right_score = fluid.layers.sigmoid(right_score) ''' type_out = fluid.layers.fc( input=cls_feats_query, size=24, param_attr=fluid.ParamAttr( name=task_name + "_cls_out_w_type", initializer=fluid.initializer.TruncatedNormal(scale=0.02), ), bias_attr=fluid.ParamAttr( name=task_name + "_cls_out_b_type", initializer=fluid.initializer.Constant(0.), ), )

if is_prediction: left_probs = left_score

right_probs = right_score

type_probs = fluid.layers.softmax(type_out)
feed_targets_name = [
    src_ids_1.name, sent_ids_1.name,pos_ids_1.name, input_mask_1.name,
    src_ids_2.name, sent_ids_2.name,pos_ids_2.name, input_mask_2.name,
    #src_ids_3.name, sent_ids_3.name, pos_ids_3.name, task_ids_3.name, input_mask_3.name,
    #qids.name,
]
ret = {}
ret['pyreader'] = pyreader
ret['left_probs'] = left_probs
#ret['right_probs'] = right_probs
ret['type_probs'] = type_probs
ret['feed_targets_name'] = feed_targets_name
return ret

num_seqs = fluid.layers.create_tensor(dtype='int64') labels = fluid.layers.cast(x=labels, dtype="float32") types = fluid.layers.cast(x=types, dtype="int64") ''' label_loss = fluid.layers.rank_loss( label=labels, left=left_score, right=right_score, ) ''' label_loss = fluid.layers.log_loss( input=left_score,label=labels ) type_loss, probs = fluid.layers.softmax_with_cross_entropy( logits=type_out, label=types, return_softmax=True, )

loss = fluid.layers.mean(x=label_loss) + fluid.layers.mean(x=type_loss) graph_vars = { "loss": loss, "left_score": left_score,

"right_score": right_score,

"labels": labels,
"probs": probs,
"types": types,
"num_seqs": num_seqs,
"qids": qids,

}

return pyreader, graph_vars 载入训练好的save_inference_model见这里 if args.init_checkpoint: init_pretraining_params(exe, args.init_checkpoint, predict_prog) else: raise ValueError( "args 'init_checkpoint' should be set for prediction!", )

assert args.save_inference_model_path, \ "args save_inference_modelpath should be set for prediction" , ckpt_dir = os.path.split(args.init_checkpoint.rstrip('/')) dir_name = ckpt_dir + '_inference_model' model_path = os.path.join(args.save_inference_model_path, dir_name) log.info("save inference model to %s" % model_path) log.info("feed_targets_name %s" % feed_targets_name)

feed_targets_name.remove('read_file_0.tmp_3')

feed_targets_name.remove('read_file_0.tmp_8')

feed_targets_name.remove('read_file_0.tmp_12')

fluid.io.save_inference_model( model_path, feed_targets_name, [left_score, type_probs],

[left_score, right_score, type_probs],

exe,
main_program=predict_prog,

) 预测见这里 config = AnalysisConfig(model_path) score_config = AnalysisConfig(score_model_path) if not args.use_cuda: log.info("disable gpu") config.disable_gpu() else: log.info("using gpu") config.enable_use_gpu(1024)

if not args.use_cuda: log.info("disable gpu") score_config.disable_gpu() else: log.info("using gpu") score_config.enable_use_gpu(1024)

Create PaddlePredictor

predictor = create_paddle_predictor(config) score_predictor = create_paddle_predictor(score_config)

predict_data_generator = reader.data_generator( input_file=args.predict_set, batch_size=args.batch_size, epoch=1, shuffle=False, )

log.info("-------------- prediction results --------------") np.set_printoptions(precision=4, suppress=True) index = 0 total_time = 0 qid_total = None left_score_total = None

right_score_total = None

type_prob_total = None ent_id_total = None for sample in predict_data_generator():

src_ids_1 = sample[0]
sent_ids_1 = sample[1]
pos_ids_1 = sample[2]
task_ids_1 = sample[3]
input_mask_1 = sample[4]
src_ids_2 = sample[5]
sent_ids_2 = sample[6]
pos_ids_2 = sample[7]
task_ids_2 = sample[8]
input_mask_2 = sample[9]
#src_ids_3 = sample[10]
#sent_ids_3 = sample[11]
#pos_ids_3 = sample[12]
#task_ids_3 = sample[13]
#input_mask_3 = sample[14]
#qids = sample[15]
#ent_ids = sample[16]
qids = sample[10]
ent_ids = sample[11]
for arr in src_ids_1:
    for val in arr:
        if not(val[0] >=0 and val[0] < 18000):
            print('src',val)
for arr in src_ids_2:
    for val in arr:
        if not(val[0] >=0 and val[0] < 18000):
            print('target',val)
for arr in pos_ids_1:
    for val in arr:
        if not(val[0] >=0 and val[0] < 18000):
            print('src',val)
for arr in pos_ids_2:
    for val in arr:
        if not(val[0] >=0 and val[0] < 18000):
            print('target',val)
for arr in sent_ids_1:
    for val in arr:
        if not(val[0] >=0 and val[0] < 18000):
            print('src',val)
for arr in sent_ids_2:
    for val in arr:
        if not(val[0] >=0 and val[0] < 18000):
            print('target',val)
for arr in input_mask_1:
    for val in arr:
        if not(val[0] >=0 and val[0] < 18000):
            print('src',val)
for arr in input_mask_2:
    for val in arr:
        if not(val[0] >=0 and val[0] < 18000):
            print('target',val)
'''
print('src_ids_1',src_ids_1)
print('sent_ids_1',sent_ids_1)
print('pos_ids_1',pos_ids_1)
print('task_ids_1',task_ids_1)
print('input_mask_1',input_mask_1)
print('src_ids_2',src_ids_2)
print('sent_ids_2',sent_ids_2)
print('pos_ids_2',pos_ids_2)
print('task_ids_2',task_ids_2)
print('input_mask_2',input_mask_2)
print('qids',qids)
print('ent_ids',ent_ids)
'''
inputs = [array2tensor(ndarray) for ndarray in [
    src_ids_1,  sent_ids_1, pos_ids_1, input_mask_1,
    src_ids_2,  sent_ids_2, pos_ids_2, input_mask_2,
    #src_ids_3, sent_ids_3, pos_ids_3, task_ids_3, input_mask_3,
    #qids,
]]
#print('inputs',inputs)
begin_time = time.time()
outputs = predictor.run(inputs)
score_outputs = score_predictor.run(inputs)

@juncaipeng 1.5版的一切正常,1.7报数据越界,1.8版的出core. 1.6版本的报错是这样的 2020-09-06 10:55:05 Exception: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 1073741824. Please check input value. 2020-09-06 10:55:05 Exception: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 1073741824. Please check input value. 2020-09-06 10:55:05 F0906 10:55:05.171013 93 device_context.cc:318] cudaStreamSynchronize unspecified launch failure errno: 4 2020-09-06 10:55:05 Check failure stack trace: 2020-09-06 10:55:05 @ 0x7fc5d490e8bd google::LogMessage::Fail() 2020-09-06 10:55:05 @ 0x7fc5d491236c google::LogMessage::SendToLog() 2020-09-06 10:55:05 @ 0x7fc5d490e3e3 google::LogMessage::Flush() 2020-09-06 10:55:05 @ 0x7fc5d491387e google::LogMessageFatal::~LogMessageFatal() 2020-09-06 10:55:05 @ 0x7fc5d6faacb7 paddle::platform::CUDADeviceContext::Wait() 2020-09-06 10:55:05 @ 0x7fc5d5a23e31 paddle::operators::StackKernel<>::Compute() 2020-09-06 10:55:05 @ 0x7fc5d5a242a3 ZNSt17_Function_handlerIFvRKN6paddle9framework16ExecutionContextEEZNKS1_24OpKernelRegistrarFunctorINS0_8platform9CUDAPlaceELb0ELm0EINS0_9operators11StackKernelINS7_17CUDADeviceContextEfEENSA_ISB_dEENSA_ISB_iEENSA_ISB_lEENSA_ISB_NS7_7float16EEEEEclEPKcSK_iEUlS4_E_E9_M_invokeERKSt9_Any_dataS4 2020-09-06 10:55:05 @ 0x7fc5d6f2b06b paddle::framework::OperatorWithKernel::RunImpl() 2020-09-06 10:55:05 @ 0x7fc5d6f2b573 paddle::framework::OperatorWithKernel::RunImpl() 2020-09-06 10:55:05 @ 0x7fc5d6f2549c paddle::framework::OperatorBase::Run() 2020-09-06 10:55:05 @ 0x7fc5d6edfef0 paddle::framework::NaiveExecutor::Run() 2020-09-06 10:55:05 @ 0x7fc5d49e946c paddle::AnalysisPredictor::Run() 2020-09-06 10:55:05 @ 0x7fc5d48de326 ZZN8pybind1112cpp_function10initializeIZN6paddle6pybind12_GLOBAL__N_121BindAnalysisPredictorEPNS_6moduleEEUlRNS2_17AnalysisPredictorERKSt6vectorINS2_12PaddleTensorESaISA_EEE_SC_IS8_SE_EINS_4nameENS_9is_methodENS_7siblingEEEEvOT_PFT0_DpT1_EDpRKT2_ENUlRNS_6detail13function_callEE1_4_FUNESW 2020-09-06 10:55:05 @ 0x7fc5d47e0ad6 pybind11::cpp_function::dispatcher() 2020-09-06 10:55:05 @ 0x55f938f81fd4 _PyCFunction_FastCallDict 2020-09-06 10:55:05 @ 0x55f93900fd3e call_function 2020-09-06 10:55:05 @ 0x55f93903419a _PyEval_EvalFrameDefault 2020-09-06 10:55:05 @ 0x55f9390097db fast_function 2020-09-06 10:55:05 @ 0x55f93900fcc5 call_function 2020-09-06 10:55:05 @ 0x55f93903419a _PyEval_EvalFrameDefault 2020-09-06 10:55:05 @ 0x55f93900a529 PyEval_EvalCodeEx 2020-09-06 10:55:05 @ 0x55f93900b2cc PyEval_EvalCode 2020-09-06 10:55:05 @ 0x55f939087af4 run_mod 2020-09-06 10:55:05 @ 0x55f939087ef1 PyRun_FileExFlags 2020-09-06 10:55:05 @ 0x55f9390880f4 PyRun_SimpleFileExFlags 2020-09-06 10:55:05 @ 0x55f93908bc28 Py_Main 2020-09-06 10:55:05 @ 0x55f938f5371e main 2020-09-06 10:55:05 @ 0x7fc6d89493d5 __libc_start_main 2020-09-06 10:55:05 @ 0x55f93903ac98 (unknown) 2020-09-06 10:55:22 sh: line 1: 93 已放弃

eshaoliu commented 4 years ago

cuda92020-09-06 19:16:05 I0906 19:16:05.972730 103 analysis_predictor.cc:833] MODEL VERSION: 1.7.1 2020-09-06 19:16:05 I0906 19:16:05.972748 103 analysis_predictor.cc:835] PREDICTOR VERSION: 1.7.1 2020-09-06 19:16:05 --- Running analysis [ir_graph_build_pass] 2020-09-06 19:16:06 --- Running analysis [ir_graph_clean_pass] 2020-09-06 19:16:06 --- Running analysis [ir_analysis_pass] 2020-09-06 19:16:06 --- Running IR pass [is_test_pass] 2020-09-06 19:16:06 --- Running IR pass [simplify_with_basic_ops_pass] 2020-09-06 19:16:06 --- Running IR pass [conv_affine_channel_fuse_pass] 2020-09-06 19:16:06 --- Running IR pass [conv_eltwiseadd_affine_channel_fuse_pass] 2020-09-06 19:16:06 --- Running IR pass [conv_bn_fuse_pass] 2020-09-06 19:16:06 --- Running IR pass [conv_eltwiseadd_bn_fuse_pass] 2020-09-06 19:16:06 --- Running IR pass [multihead_matmul_fuse_pass] 2020-09-06 19:16:18 I0906 19:16:18.076560 103 graph_pattern_detector.cc:101] --- detected 24 subgraphs 2020-09-06 19:16:18 --- Running IR pass [fc_fuse_pass] 2020-09-06 19:16:18 I0906 19:16:18.142786 103 graph_pattern_detector.cc:101] --- detected 24 subgraphs 2020-09-06 19:16:18 I0906 19:16:18.184167 103 graph_pattern_detector.cc:101] --- detected 52 subgraphs 2020-09-06 19:16:18 --- Running IR pass [fc_elementwise_layernorm_fuse_pass] 2020-09-06 19:16:18 I0906 19:16:18.245363 103 graph_pattern_detector.cc:101] --- detected 48 subgraphs 2020-09-06 19:16:18 --- Running IR pass [conv_elementwise_add_act_fuse_pass] 2020-09-06 19:16:18 --- Running IR pass [conv_elementwise_add2_act_fuse_pass] 2020-09-06 19:16:18 --- Running IR pass [conv_elementwise_add_fuse_pass] 2020-09-06 19:16:18 --- Running IR pass [transpose_flatten_concat_fuse_pass] 2020-09-06 19:16:18 --- Running IR pass [runtime_context_cache_pass] 2020-09-06 19:16:18 --- Running analysis [ir_params_sync_among_devices_pass] 2020-09-06 19:16:18 I0906 19:16:18.272835 103 ir_params_sync_among_devices_pass.cc:41] Sync params from CPU to GPU 2020-09-06 19:16:18 --- Running analysis [adjust_cudnn_workspace_size_pass] 2020-09-06 19:16:18 --- Running analysis [inference_op_replace_pass] 2020-09-06 19:16:18 --- Running analysis [ir_graph_to_program_pass] 2020-09-06 19:16:18 I0906 19:16:18.492741 103 analysis_predictor.cc:462] ======= optimize end ======= 2020-09-06 19:16:20 2020-09-06 19:16:20,959-INFO: -------------- prediction results -------------- 2020-09-06 19:16:20 [INFO] 2020-09-06 19:16:20,959 [infer_type_ranker.py: 286]: -------------- prediction results -------------- 2020-09-06 19:16:20 2020-09-06 19:16:20,959-INFO: prepare_batch_data 2020-09-06 19:16:20 [INFO] 2020-09-06 19:16:20,959 [type_pairwise_ranker_reader.py: 174]: prepare_batch_data 2020-09-06 19:16:21 W0906 19:16:21.051622 103 naive_executor.cc:45] The NaiveExecutor can not work properly if the cmake flag ON_INFER is not set. 2020-09-06 19:16:21 W0906 19:16:21.051646 103 naive_executor.cc:47] Unlike the training phase, all the scopes and variables will be reused to save the allocation overhead. 2020-09-06 19:16:21 W0906 19:16:21.051649 103 naive_executor.cc:50] Please re-compile the inference library by setting the cmake flag ON_INFER=ON if you are running Paddle Inference 2020-09-06 19:16:21 ----------place----------- 2020-09-06 19:16:21 CUDAPlace(0) 2020-09-06 19:16:21 save inference_model done 2020-09-06 19:16:21 config <paddle.fluid.core_avx.AnalysisConfig object at 0x7f994d042ab0> 2020-09-06 19:16:21 Traceback (most recent call last): 2020-09-06 19:16:21 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/infer_type_ranker.py", line 519, in 2020-09-06 19:16:21 main(args) 2020-09-06 19:16:21 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/infer_type_ranker.py", line 370, in main 2020-09-06 19:16:21 outputs = predictor.run(inputs) 2020-09-06 19:16:21 paddle.fluid.core_avx.EnforceNotMet: 2020-09-06 19:16:21 2020-09-06 19:16:21 -------------------------------------------- 2020-09-06 19:16:21 C++ Call Stacks (More useful to developers): 2020-09-06 19:16:21 -------------------------------------------- 2020-09-06 19:16:21 0 std::string paddle::platform::GetTraceBackString<char const>(char const&&, char const, int) 2020-09-06 19:16:21 1 paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const, int) 2020-09-06 19:16:21 2 void paddle::operators::math::Blaspaddle::platform::CUDADeviceContext::GEMM(CBLAS_TRANSPOSE, CBLAS_TRANSPOSE, int, int, int, float, float const, float const, float, float) const 2020-09-06 19:16:21 3 void paddle::operators::math::Blaspaddle::platform::CUDADeviceContext::MatMul(paddle::framework::Tensor const&, bool, paddle::framework::Tensor const&, bool, float, paddle::framework::Tensor, float) const 2020-09-06 19:16:21 4 paddle::operators::MulKernel<paddle::platform::CUDADeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const 2020-09-06 19:16:21 5 std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::MulKernel<paddle::platform::CUDADeviceContext, float>, paddle::operators::MulKernel<paddle::platform::CUDADeviceContext, double>, paddle::operators::MulKernel<paddle::platform::CUDADeviceContext, paddle::platform::float16> >::operator()(char const, char const, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&) 2020-09-06 19:16:21 6 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext) const 2020-09-06 19:16:21 7 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const 2020-09-06 19:16:21 8 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&) 2020-09-06 19:16:21 9 paddle::framework::NaiveExecutor::Run() 2020-09-06 19:16:21 10 paddle::AnalysisPredictor::Run(std::vector<paddle::PaddleTensor, std::allocatorpaddle::PaddleTensor > const&, std::vector<paddle::PaddleTensor, std::allocatorpaddle::PaddleTensor >, int) 2020-09-06 19:16:21 2020-09-06 19:16:21 ------------------------------------------ 2020-09-06 19:16:21 Python Call Stacks (More useful to users): 2020-09-06 19:16:21 ------------------------------------------ 2020-09-06 19:16:21 File "/usr/local/anaconda3/lib/python3.6/site-packages/paddle/fluid/framework.py", line 2525, in append_op 2020-09-06 19:16:21 attrs=kwargs.get("attrs", None)) 2020-09-06 19:16:21 File "/usr/local/anaconda3/lib/python3.6/site-packages/paddle/fluid/layer_helper.py", line 43, in append_op 2020-09-06 19:16:21 return self.main_program.current_block().append_op(*args, **kwargs) 2020-09-06 19:16:21 File "/usr/local/anaconda3/lib/python3.6/site-packages/paddle/fluid/layers/nn.py", line 344, in fc 2020-09-06 19:16:21 "y_num_col_dims": 1}) 2020-09-06 19:16:21 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/model/transformer_encoder.py", line 64, in __compute_qkv 2020-09-06 19:16:21 bias_attr=name + '_query_fc.b_0', 2020-09-06 19:16:21 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/model/transformer_encoder.py", line 144, in multi_head_attention 2020-09-06 19:16:21 q, k, v = __compute_qkv(queries, keys, values, n_head, d_key, d_value) 2020-09-06 19:16:21 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/model/transformer_encoder.py", line 319, in encoder_layer 2020-09-06 19:16:21 name=name + '_multi_head_att', 2020-09-06 19:16:21 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/model/transformer_encoder.py", line 389, in encoder 2020-09-06 19:16:21 name=name + 'layer' + str(i), 2020-09-06 19:16:21 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/model/ernie.py", line 195, in _build_model 2020-09-06 19:16:21 name='encoder', 2020-09-06 19:16:21 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/model/ernie.py", line 108, in init 2020-09-06 19:16:21 input_mask, 2020-09-06 19:16:21 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/finetune/type_pairwise_ranker.py", line 51, in cls_from_ernie 2020-09-06 19:16:21 use_fp16=use_fp16, 2020-09-06 19:16:21 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/finetune/type_pairwise_ranker.py", line 160, in create_model 2020-09-06 19:16:21 use_fp16=args.use_fp16, 2020-09-06 19:16:21 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/infer_type_ranker.py", line 129, in main 2020-09-06 19:16:21 is_prediction=True, 2020-09-06 19:16:21 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/infer_type_ranker.py", line 519, in 2020-09-06 19:16:21 main(args) 2020-09-06 19:16:21 2020-09-06 19:16:21 ---------------------- 2020-09-06 19:16:21 Error Message Summary: 2020-09-06 19:16:21 ---------------------- 2020-09-06 19:16:21 Error: An error occurred here. There is no accurate error hint for this error yet. We are continuously in the process of increasing hint for this kind of error check. It would be helpful if you could inform us of how this conversion went by opening a github issue. And we will resolve it with high priority. 2020-09-06 19:16:21 - New issue link: https://github.com/PaddlePaddle/Paddle/issues/new 2020-09-06 19:16:21 - Recommended issue content: all error stack information 2020-09-06 19:16:21 [Hint: CUBLAS_STATUS_EXECUTION_FAILED] at (/paddle/paddle/fluid/operators/math/blas_impl.cu.h:34) 2020-09-06 19:16:21 [operator < mul > error]

The job is:FAILED @juncaipeng @yellmi 重点查1.7.1版本的问题吧,这个save_inferenced_model能过,卡在predictor.run 上

eshaoliu commented 4 years ago

2020-09-06 19:40:09 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4875287335277170688. Please check input value. 当有2个predictor预测的时候会有越界的错误

juncaipeng commented 4 years ago

2020-09-06 19:40:09 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4875287335277170688. Please check input value. 当有2个predictor预测的时候会有越界的错误

这个预测错误,一般是输入数据的类型不匹配,输入类型应该是int32类型。

eshaoliu commented 4 years ago

2020-09-06 19:40:09 Error: /paddle/paddle/fluid/operators/lookup_table_op.cu:43 Assertion id < N failed. Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 18000, but got 4875287335277170688. Please check input value. 当有2个predictor预测的时候会有越界的错误

这个预测错误,一般是输入数据的类型不匹配,输入类型应该是int32类型。

我定义的时候就是int64,这个没有问题。

eshaoliu commented 4 years ago

2020-09-07 17:56:21 2020-09-07 17:56:21,265-INFO: -------------- prediction results -------------- 2020-09-07 17:56:21 [INFO] 2020-09-07 17:56:21,265 [infer_type_ranker.py: 286]: -------------- prediction results -------------- 2020-09-07 17:56:21 2020-09-07 17:56:21,265-INFO: prepare_batch_data 2020-09-07 17:56:21 [INFO] 2020-09-07 17:56:21,265 [type_pairwise_ranker_reader.py: 174]: prepare_batch_data 2020-09-07 17:56:21 W0907 17:56:21.373543 103 naive_executor.cc:45] The NaiveExecutor can not work properly if the cmake flag ON_INFER is not set. 2020-09-07 17:56:21 W0907 17:56:21.373571 103 naive_executor.cc:47] Unlike the training phase, all the scopes and variables will be reused to save the allocation overhead. 2020-09-07 17:56:21 W0907 17:56:21.373575 103 naive_executor.cc:50] Please re-compile the inference library by setting the cmake flag ON_INFER=ON if you are running Paddle Inference 2020-09-07 17:56:21 ----------place----------- 2020-09-07 17:56:21 CUDAPlace(0) 2020-09-07 17:56:21 save inference_model done 2020-09-07 17:56:21 ----------place----------- 2020-09-07 17:56:21 CUDAPlace(0) 2020-09-07 17:56:21 config <paddle.fluid.core_avx.AnalysisConfig object at 0x7f736a15fca8> 2020-09-07 17:56:21 Traceback (most recent call last): 2020-09-07 17:56:21 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/infer_type_ranker.py", line 519, in 2020-09-07 17:56:21 main(args) 2020-09-07 17:56:21 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/infer_type_ranker.py", line 370, in main 2020-09-07 17:56:21 outputs = predictor.run(inputs) 2020-09-07 17:56:21 paddle.fluid.core_avx.EnforceNotMet: 2020-09-07 17:56:21 2020-09-07 17:56:21 -------------------------------------------- 2020-09-07 17:56:21 C++ Call Stacks (More useful to developers): 2020-09-07 17:56:21 -------------------------------------------- 2020-09-07 17:56:21 0 std::string paddle::platform::GetTraceBackString<char const>(char const&&, char const, int) 2020-09-07 17:56:21 1 paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const, int) 2020-09-07 17:56:21 2 void paddle::operators::MatMulWithHeadQK(paddle::platform::CUDADeviceContext const&, int, int, int, int, bool, bool, float, float, float, float const, float, float) 2020-09-07 17:56:21 3 void paddle::operators::MultiHeadGPUCompute(paddle::platform::CUDADeviceContext const&, int, paddle::framework::DDim const&, paddle::framework::DDim const&, paddle::framework::DDim const&, float const, float const, float const, float const, float const, float const, float const, float, float, float, bool, bool, bool) 2020-09-07 17:56:21 4 paddle::operators::MultiHeadMatMulKernel<paddle::platform::CUDADeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const 2020-09-07 17:56:21 5 std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::MultiHeadMatMulKernel<paddle::platform::CUDADeviceContext, float>, paddle::operators::MultiHeadMatMulKernel<paddle::platform::CUDADeviceContext, double> >::operator()(char const, char const, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&) 2020-09-07 17:56:21 6 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext) const 2020-09-07 17:56:21 7 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const 2020-09-07 17:56:21 8 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&) 2020-09-07 17:56:21 9 paddle::framework::NaiveExecutor::Run() 2020-09-07 17:56:21 10 paddle::AnalysisPredictor::Run(std::vector<paddle::PaddleTensor, std::allocator > const&, std::vector<paddle::PaddleTensor, std::allocator >, int) 2020-09-07 17:56:21 2020-09-07 17:56:21 ---------------------- 2020-09-07 17:56:21 Error Message Summary: 2020-09-07 17:56:21 ---------------------- 2020-09-07 17:56:21 Error: An error occurred here. There is no accurate error hint for this error yet. We are continuously in the process of increasing hint for this kind of error check. It would be helpful if you could inform us of how this conversion went by opening a github issue. And we will resolve it with high priority. 2020-09-07 17:56:21 - New issue link: https://github.com/PaddlePaddle/Paddle/issues/new 2020-09-07 17:56:21 - Recommended issue content: all error stack information 2020-09-07 17:56:21 [Hint: CUBLAS_STATUS_EXECUTION_FAILED] at (/paddle/paddle/fluid/operators/math/blas_impl.cu.h:51) 2020-09-07 17:56:21 重点看这个错误吧,这次运行没有报数据越界的错误,代码没改,报错信息也不稳定。版本1.7.1

eshaoliu commented 4 years ago

https://github.com/PaddlePaddle/Paddle/issues/20921 和这个issue有关?是不是“和针对transformer引入的一堆优化pass导致的”?

eshaoliu commented 4 years ago

另一种报错是这样的 2020-09-07 18:24:32 config <paddle.fluid.core_avx.AnalysisConfig object at 0x7fbbc6d9cc70> 2020-09-07 18:24:32 Traceback (most recent call last): 2020-09-07 18:24:32 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/infer_type_ranker.py", line 521, in 2020-09-07 18:24:32 main(args) 2020-09-07 18:24:32 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/infer_type_ranker.py", line 372, in main 2020-09-07 18:24:32 outputs = predictor.run(inputs) 2020-09-07 18:24:32 paddle.fluid.core_avx.EnforceNotMet: 2020-09-07 18:24:32 2020-09-07 18:24:32 -------------------------------------------- 2020-09-07 18:24:32 C++ Call Stacks (More useful to developers): 2020-09-07 18:24:32 -------------------------------------------- 2020-09-07 18:24:32 0 std::string paddle::platform::GetTraceBackString<char const>(char const&&, char const, int) 2020-09-07 18:24:32 1 paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const, int) 2020-09-07 18:24:32 2 void paddle::operators::math::Blas::GEMM(CBLAS_TRANSPOSE, CBLAS_TRANSPOSE, int, int, int, float, float const, float const, float, float) const 2020-09-07 18:24:32 3 void paddle::operators::math::Blas::MatMul(paddle::framework::Tensor const&, bool, paddle::framework::Tensor const&, bool, float, paddle::framework::Tensor, float) const 2020-09-07 18:24:32 4 paddle::operators::MulKernel<paddle::platform::CUDADeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const 2020-09-07 18:24:32 5 std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::MulKernel<paddle::platform::CUDADeviceContext, float>, paddle::operators::MulKernel<paddle::platform::CUDADeviceContext, double>, paddle::operators::MulKernel<paddle::platform::CUDADeviceContext, paddle::platform::float16> >::operator()(char const, char const, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&) 2020-09-07 18:24:32 6 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext) const 2020-09-07 18:24:32 7 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const 2020-09-07 18:24:32 8 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&) 2020-09-07 18:24:32 9 paddle::framework::NaiveExecutor::Run() 2020-09-07 18:24:32 10 paddle::AnalysisPredictor::Run(std::vector<paddle::PaddleTensor, std::allocator > const&, std::vector<paddle::PaddleTensor, std::allocator >, int) 2020-09-07 18:24:32 2020-09-07 18:24:32 ------------------------------------------ 2020-09-07 18:24:32 Python Call Stacks (More useful to users): 2020-09-07 18:24:32 ------------------------------------------ 2020-09-07 18:24:32 File "/usr/local/anaconda3/lib/python3.6/site-packages/paddle/fluid/framework.py", line 2525, in append_op 2020-09-07 18:24:32 attrs=kwargs.get("attrs", None)) 2020-09-07 18:24:32 File "/usr/local/anaconda3/lib/python3.6/site-packages/paddle/fluid/layer_helper.py", line 43, in append_op 2020-09-07 18:24:32 return self.main_program.current_block().append_op(*args, **kwargs) 2020-09-07 18:24:32 File "/usr/local/anaconda3/lib/python3.6/site-packages/paddle/fluid/layers/nn.py", line 344, in fc 2020-09-07 18:24:32 "y_num_col_dims": 1}) 2020-09-07 18:24:32 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/model/transformer_encoder.py", line 74, in __compute_qkv 2020-09-07 18:24:32 bias_attr=name + '_key_fc.b_0', 2020-09-07 18:24:32 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/model/transformer_encoder.py", line 144, in multi_head_attention 2020-09-07 18:24:32 q, k, v = compute_qkv(queries, keys, values, n_head, d_key, d_value) 2020-09-07 18:24:32 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/model/transformer_encoder.py", line 319, in encoder_layer 2020-09-07 18:24:32 name=name + '_multi_head_att', 2020-09-07 18:24:32 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/model/transformer_encoder.py", line 389, in encoder 2020-09-07 18:24:32 name=name + 'layer' + str(i), 2020-09-07 18:24:32 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/model/ernie.py", line 195, in _build_model 2020-09-07 18:24:32 name='encoder', 2020-09-07 18:24:32 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/model/ernie.py", line 108, in init__ 2020-09-07 18:24:32 input_mask, 2020-09-07 18:24:32 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/finetune/type_pairwise_ranker.py", line 51, in cls_from_ernie 2020-09-07 18:24:32 use_fp16=use_fp16, 2020-09-07 18:24:32 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/finetune/type_pairwise_ranker.py", line 149, in create_model 2020-09-07 18:24:32 use_fp16=args.use_fp16, 2020-09-07 18:24:32 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/infer_type_ranker.py", line 129, in main 2020-09-07 18:24:32 is_prediction=True, 2020-09-07 18:24:32 File "/media/cfs/liuhongru3/Research-master/KG/DuEL_Baseline/ernie/infer_type_ranker.py", line 521, in 2020-09-07 18:24:32 main(args) 2020-09-07 18:24:32 2020-09-07 18:24:32 ---------------------- 2020-09-07 18:24:32 Error Message Summary: 2020-09-07 18:24:32 ---------------------- 2020-09-07 18:24:32 Error: An error occurred here. There is no accurate error hint for this error yet. We are continuously in the process of increasing hint for this kind of error check. It would be helpful if you could inform us of how this conversion went by opening a github issue. And we will resolve it with high priority. 2020-09-07 18:24:32 - New issue link: https://github.com/PaddlePaddle/Paddle/issues/new 2020-09-07 18:24:32 - Recommended issue content: all error stack information 2020-09-07 18:24:32 [Hint: CUBLAS_STATUS_INTERNAL_ERROR] at (/paddle/paddle/fluid/operators/math/blas_impl.cu.h:34) 2020-09-07 18:24:32 [operator < mul > error]

eshaoliu commented 4 years ago

有时候会报这个错,都是1.7.1版本 Error: cudaMemcpyAsync failed in paddle::platform::GpuMemcpyAsync (0x555b688be580 -> 0x7f97c6000000, length: 96) error code : 4, Please see detail in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038: unspecified launch failure at (/paddle/paddle/fluid/platform/gpu_info.cc:314)

juncaipeng commented 4 years ago

不好意思,没有及时看到这个issue有新的留言。 请问可以针对最新1.8.4验证问题吗?目前我们最新维护的是1.8.4。