PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
12.13k stars 2.94k forks source link

paddlenlp In-batch Negatives模型预测报错 #2573

Closed yrg5101 closed 2 years ago

yrg5101 commented 2 years ago

欢迎您反馈PaddleNLP使用问题,非常感谢您对PaddleNLP的贡献! 在留下您的问题时,辛苦您同步提供如下信息:

  1. 下载PaddleNLP In-batch Negatives 提供的模型 https://bj.bcebos.com/v1/paddlenlp/models/inbatch_model.zip

2.把动态图模型转换为静态图: python export_model.py --params_path checkpoints/inbatch/model_40/model_state.pdparams --output_path=./output

3.进行预测 然后使用PaddleInference python deploy/python/predict.py --model_dir=./output

错误: E0620 11:31:14.865458 9628 analysis_config.cc:95] Please compile with gpu to EnableGpu() e[1me[35m--- Running analysis [ir_graph_build_pass]e[0m e[1me[35m--- Running analysis [ir_graph_clean_pass]e[0m e[1me[35m--- Running analysis [ir_analysis_pass]e[0m e[32m--- Running IR pass [simplify_with_basic_ops_pass]e[0m e[32m--- Running IR pass [layer_norm_fuse_pass]e[0m e[37m--- Fused 0 subgraphs into layer_norm op.e[0m e[32m--- Running IR pass [attention_lstm_fuse_pass]e[0m e[32m--- Running IR pass [seqconv_eltadd_relu_fuse_pass]e[0m e[32m--- Running IR pass [seqpool_cvm_concat_fuse_pass]e[0m e[32m--- Running IR pass [mul_lstm_fuse_pass]e[0m e[32m--- Running IR pass [fc_gru_fuse_pass]e[0m e[37m--- fused 0 pairs of fc gru patternse[0m e[32m--- Running IR pass [mul_gru_fuse_pass]e[0m e[32m--- Running IR pass [seq_concat_fc_fuse_pass]e[0m e[32m--- Running IR pass [gpu_cpu_squeeze2_matmul_fuse_pass]e[0m e[32m--- Running IR pass [gpu_cpu_reshape2_matmul_fuse_pass]e[0m e[32m--- Running IR pass [gpu_cpu_flatten2_matmul_fuse_pass]e[0m e[32m--- Running IR pass [matmul_v2_scale_fuse_pass]e[0m e[32m--- Running IR pass [gpu_cpu_map_matmul_v2_to_mul_pass]e[0m I0620 11:31:17.156193 9628 fuse_pass_base.cc:57] --- detected 74 subgraphs e[32m--- Running IR pass [gpu_cpu_map_matmul_v2_to_matmul_pass]e[0m I0620 11:31:17.166157 9628 fuse_pass_base.cc:57] --- detected 24 subgraphs e[32m--- Running IR pass [matmul_scale_fuse_pass]e[0m e[32m--- Running IR pass [gpu_cpu_map_matmul_to_mul_pass]e[0m e[32m--- Running IR pass [fc_fuse_pass]e[0m I0620 11:31:20.554153 9628 fuse_pass_base.cc:57] --- detected 74 subgraphs e[32m--- Running IR pass [repeated_fc_relu_fuse_pass]e[0m e[32m--- Running IR pass [squared_mat_sub_fuse_pass]e[0m e[32m--- Running IR pass [conv_bn_fuse_pass]e[0m e[32m--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]e[0m e[32m--- Running IR pass [conv_transpose_bn_fuse_pass]e[0m e[32m--- Running IR pass [conv_transpose_eltwiseadd_bn_fuse_pass]e[0m e[32m--- Running IR pass [is_test_pass]e[0m e[32m--- Running IR pass [runtime_context_cache_pass]e[0m e[1me[35m--- Running analysis [ir_params_sync_among_devices_pass]e[0m e[1me[35m--- Running analysis [adjust_cudnn_workspace_size_pass]e[0m e[1me[35m--- Running analysis [inference_op_replace_pass]e[0m e[1me[35m--- Running analysis [ir_graph_to_program_pass]e[0m I0620 11:31:22.861095 9628 analysis_predictor.cc:1007] ======= optimize end ======= I0620 11:31:22.864109 9628 naive_executor.cc:102] --- skip [feed], feed -> token_type_ids I0620 11:31:22.865096 9628 naive_executor.cc:102] --- skip [feed], feed -> input_ids I0620 11:31:23.048810 9628 naive_executor.cc:102] --- skip [elementwise_div_0], fetch -> fetch [2022-06-20 11:31:23,110] [ INFO] - Already cached C:\Users\Administrator.paddlenlp\models\ernie-1.0\vocab.txt Traceback (most recent call last): File "C:/Users/Administrator/Desktop/tx/PaddleNLP/applications/neural_search/recall/in_batch_negative/deploy/python/predict.py", line 290, in res = predictor.predict(corpus_list, tokenizer) File "C:/Users/Administrator/Desktop/tx/PaddleNLP/applications/neural_search/recall/in_batch_negative/deploy/python/predict.py", line 234, in predict input_ids, segment_ids = convert_example({idx: text[0]}, tokenizer) File "C:/Users/Administrator/Desktop/tx/PaddleNLP/applications/neural_search/recall/in_batch_negative/deploy/python/predict.py", line 90, in convert_example pad_to_max_seq_len=pad_to_max_seq_len) File "C:\Users\Administrator\Desktop\tx\PaddleNLP\paddlenlp\transformers\tokenizer_utils_base.py", line 2258, in call kwargs) File "C:\Users\Administrator\Desktop\tx\PaddleNLP\paddlenlp\transformers\tokenizer_utils_base.py", line 2332, in encode kwargs, File "C:\Users\Administrator\Desktop\tx\PaddleNLP\paddlenlp\transformers\tokenizer_utils.py", line 1006, in _encode_plus first_ids = get_input_ids(text) File "C:\Users\Administrator\Desktop\tx\PaddleNLP\paddlenlp\transformers\tokenizer_utils.py", line 985, in get_input_ids tokens = self.tokenize(text, kwargs) File "C:\Users\Administrator\Desktop\tx\PaddleNLP\paddlenlp\transformers\tokenizer_utils.py", line 780, in tokenize tokenized_text.extend(self._tokenize(token, kwargs)) TypeError: _tokenize() got an unexpected keyword argument 'max_seq_len'

yrg5101 commented 2 years ago

batchify_fn = lambda samples, fn=Tuple( Pad(axis=0, pad_val=tokenizer.pad_token_id,dtype="int64"), # input Pad(axis=0, pad_val=tokenizer.pad_token_id,dtype="int64"), # segment Pad(axis=0, pad_val=tokenizer.pad_token_id,dtype="int64"), # segment Pad(axis=0, pad_val=tokenizer.pad_token_id,dtype="int64"), # segment ): fn(samples)

修改增加dtype="int64"