Closed charlieliu9999 closed 1 year ago
部署代码已经合入,对于CBLUE数据集微调的模型可以直接使用。https://github.com/PaddlePaddle/PaddleNLP/tree/develop/model_zoo/ernie-health/cblue/deploy/predictor
非常感谢提供了predictor代码,可在运行时有问题请教下:
1、 采用 cblue 训练模型
python train_spo.py --batch_size 12 --max_seq_length 300 --learning_rate 6e-5 --epochs 3
对模型进行预训练,在Macbook上,用CPU跑,
但 spo_loss 从很大值降到 100多,spo fi: 始终为0,这是正常的吗?
global step 2300, epoch: 1, batch: 2300, loss: 223.04558, ent_loss: 114.37089, spo_loss: 108.67469, speed: 0.71 steps/s
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 897/897 [46:15<00:00, 3.09s/it]
eval loss: 249.20973, entity f1: 0.00000, spo f1: 0.00000
[2022-07-27 13:26:15,749] [ INFO] - tokenizer config file saved in ./checkpoint/model_2300/tokenizer_config.json
[2022-07-27 13:26:15,750] [ INFO] - Special tokens file saved in ./checkpoint/model_2300/special_tokens_map.json
2、我用中间模型输出 静态图做预测,跑的结果出错, 如下:
` python infer_spo.py --device cpu --dataset CMeIE --model_path_prefix ../../cblue/export_CMeIE/inference
[2022-07-27 14:27:25,541] [ INFO] - model_path_prefix : ../../cblue/export_CMeIE/inference
[2022-07-27 14:27:25,541] [ INFO] - model_name_or_path : ernie-health-chinese
[2022-07-27 14:27:25,541] [ INFO] - dataset : CMeIE
[2022-07-27 14:27:25,541] [ INFO] - data_file : None
[2022-07-27 14:27:25,541] [ INFO] - max_seq_length : 300
[2022-07-27 14:27:25,541] [ INFO] - use_fp16 : False
[2022-07-27 14:27:25,542] [ INFO] - num_threads : 4
[2022-07-27 14:27:25,542] [ INFO] - batch_size : 20
[2022-07-27 14:27:25,542] [ INFO] - device : cpu
[2022-07-27 14:27:25,542] [ INFO] - device_id : 0
[2022-07-27 14:27:25,542] [ WARNING] - Can't find the faster_tokenizer package, please ensure install faster_tokenizer correctly. You can install faster_tokenizer by `pip install faster_tokenizer`(Currently only work for linux platform).
[2022-07-27 14:27:25,542] [ INFO] - We are using <class 'paddlenlp.transformers.electra.tokenizer.ElectraTokenizer'> to load 'ernie-health-chinese'.
[2022-07-27 14:27:25,542] [ INFO] - Already cached /Users/lizzysong/.paddlenlp/models/ernie-health-chinese/vocab.txt
[2022-07-27 14:27:25,558] [ INFO] - tokenizer config file saved in /Users/lizzysong/.paddlenlp/models/ernie-health-chinese/tokenizer_config.json
[2022-07-27 14:27:25,558] [ INFO] - Special tokens file saved in /Users/lizzysong/.paddlenlp/models/ernie-health-chinese/special_tokens_map.json
[2022-07-27 14:27:25,558] [ INFO] - >>> [InferBackend] Creating Engine ...
[Paddle2ONNX] Start to parse PaddlePaddle model...
[Paddle2ONNX] Model file path: ../../cblue/export_CMeIE/inference.pdmodel
[Paddle2ONNX] Paramters file path: ../../cblue/export_CMeIE/inference.pdiparams
[Paddle2ONNX] Start to parsing Paddle model...
[Paddle2ONNX] Use opset_version = 13 for ONNX export.
[Paddle2ONNX] PaddlePaddle model is exported as ONNX format now.
[2022-07-27 14:27:31,926] [ INFO] - >>> [InferBackend] Use CPU to inference ...
[2022-07-27 14:27:33,617] [ INFO] - >>> [InferBackend] Engine Created ...
Traceback (most recent call last):
File "/Users/lizzysong/PaddleNLP/model_zoo/ernie-health/deploy/predictor/infer_spo.py", line 69, in <module>
predictor.predict(input_data)
File "/Users/lizzysong/PaddleNLP/model_zoo/ernie-health/deploy/predictor/predictor.py", line 320, in predict
infer_result = self.infer_batch(encoded_inputs)
File "/Users/lizzysong/PaddleNLP/model_zoo/ernie-health/deploy/predictor/predictor.py", line 151, in infer_batch
results = self._infer(input_dict)
File "/Users/lizzysong/PaddleNLP/model_zoo/ernie-health/deploy/predictor/predictor.py", line 140, in _infer
infer_data = self.inference_backend.infer(input_dict)
File "/Users/lizzysong/PaddleNLP/model_zoo/ernie-health/deploy/predictor/predictor.py", line 116, in infer
result = self.predictor.run(None, input_dict)
File "/Users/lizzysong/opt/anaconda3/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 188, in run
raise ValueError("Model requires {} inputs. Input Feed contains {}".format(num_required_inputs, num_inputs))
ValueError: Model requires 4 inputs. Input Feed contains 3
This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。
This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。
ernie-health 模型能否使用 taskflow 提供医学文本 实体关系抽取的任务? 如果不能,采用什么方法实现?