PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
12.1k stars 2.94k forks source link

如何测试 ernie-health 模型的预测结果? #2827

Closed charlieliu9999 closed 1 year ago

charlieliu9999 commented 2 years ago

ernie-health 模型能否使用 taskflow 提供医学文本 实体关系抽取的任务? 如果不能,采用什么方法实现?

LemonNoel commented 2 years ago

ernie-health暂时没有放入taskflow,如果需要预测结果可以初步参考下这个PR ,最终版本这周应该会整理合入。

LemonNoel commented 2 years ago

部署代码已经合入,对于CBLUE数据集微调的模型可以直接使用。https://github.com/PaddlePaddle/PaddleNLP/tree/develop/model_zoo/ernie-health/cblue/deploy/predictor

charlieliu9999 commented 2 years ago

非常感谢提供了predictor代码,可在运行时有问题请教下:

1、 采用 cblue 训练模型 python train_spo.py --batch_size 12 --max_seq_length 300 --learning_rate 6e-5 --epochs 3 对模型进行预训练,在Macbook上,用CPU跑, 但 spo_loss 从很大值降到 100多,spo fi: 始终为0,这是正常的吗?

global step 2300, epoch: 1, batch: 2300, loss: 223.04558, ent_loss: 114.37089, spo_loss: 108.67469, speed: 0.71 steps/s
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 897/897 [46:15<00:00,  3.09s/it]
eval loss: 249.20973, entity f1: 0.00000, spo f1: 0.00000
[2022-07-27 13:26:15,749] [    INFO] - tokenizer config file saved in ./checkpoint/model_2300/tokenizer_config.json
[2022-07-27 13:26:15,750] [    INFO] - Special tokens file saved in ./checkpoint/model_2300/special_tokens_map.json

2、我用中间模型输出 静态图做预测,跑的结果出错, 如下:

` python infer_spo.py --device cpu --dataset CMeIE --model_path_prefix ../../cblue/export_CMeIE/inference

[2022-07-27 14:27:25,541] [    INFO] - model_path_prefix   : ../../cblue/export_CMeIE/inference
[2022-07-27 14:27:25,541] [    INFO] - model_name_or_path  : ernie-health-chinese
[2022-07-27 14:27:25,541] [    INFO] - dataset             : CMeIE
[2022-07-27 14:27:25,541] [    INFO] - data_file           : None
[2022-07-27 14:27:25,541] [    INFO] - max_seq_length      : 300
[2022-07-27 14:27:25,541] [    INFO] - use_fp16            : False
[2022-07-27 14:27:25,542] [    INFO] - num_threads         : 4
[2022-07-27 14:27:25,542] [    INFO] - batch_size          : 20
[2022-07-27 14:27:25,542] [    INFO] - device              : cpu
[2022-07-27 14:27:25,542] [    INFO] - device_id           : 0
[2022-07-27 14:27:25,542] [ WARNING] - Can't find the faster_tokenizer package, please ensure install faster_tokenizer correctly. You can install faster_tokenizer by `pip install faster_tokenizer`(Currently only work for linux platform).
[2022-07-27 14:27:25,542] [    INFO] - We are using <class 'paddlenlp.transformers.electra.tokenizer.ElectraTokenizer'> to load 'ernie-health-chinese'.
[2022-07-27 14:27:25,542] [    INFO] - Already cached /Users/lizzysong/.paddlenlp/models/ernie-health-chinese/vocab.txt
[2022-07-27 14:27:25,558] [    INFO] - tokenizer config file saved in /Users/lizzysong/.paddlenlp/models/ernie-health-chinese/tokenizer_config.json
[2022-07-27 14:27:25,558] [    INFO] - Special tokens file saved in /Users/lizzysong/.paddlenlp/models/ernie-health-chinese/special_tokens_map.json
[2022-07-27 14:27:25,558] [    INFO] - >>> [InferBackend] Creating Engine ...
[Paddle2ONNX] Start to parse PaddlePaddle model...
[Paddle2ONNX] Model file path: ../../cblue/export_CMeIE/inference.pdmodel
[Paddle2ONNX] Paramters file path: ../../cblue/export_CMeIE/inference.pdiparams
[Paddle2ONNX] Start to parsing Paddle model...
[Paddle2ONNX] Use opset_version = 13 for ONNX export.
[Paddle2ONNX] PaddlePaddle model is exported as ONNX format now.
[2022-07-27 14:27:31,926] [    INFO] - >>> [InferBackend] Use CPU to inference ...
[2022-07-27 14:27:33,617] [    INFO] - >>> [InferBackend] Engine Created ...
Traceback (most recent call last):
  File "/Users/lizzysong/PaddleNLP/model_zoo/ernie-health/deploy/predictor/infer_spo.py", line 69, in <module>
    predictor.predict(input_data)
  File "/Users/lizzysong/PaddleNLP/model_zoo/ernie-health/deploy/predictor/predictor.py", line 320, in predict
    infer_result = self.infer_batch(encoded_inputs)
  File "/Users/lizzysong/PaddleNLP/model_zoo/ernie-health/deploy/predictor/predictor.py", line 151, in infer_batch
    results = self._infer(input_dict)
  File "/Users/lizzysong/PaddleNLP/model_zoo/ernie-health/deploy/predictor/predictor.py", line 140, in _infer
    infer_data = self.inference_backend.infer(input_dict)
  File "/Users/lizzysong/PaddleNLP/model_zoo/ernie-health/deploy/predictor/predictor.py", line 116, in infer
    result = self.predictor.run(None, input_dict)
  File "/Users/lizzysong/opt/anaconda3/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 188, in run
    raise ValueError("Model requires {} inputs. Input Feed contains {}".format(num_required_inputs, num_inputs))
ValueError: Model requires 4 inputs. Input Feed contains 3
github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。

github-actions[bot] commented 1 year ago

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。