PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
12.17k stars 2.94k forks source link

[Bug]: paddlenlp的uie-senta模型属性聚合运行示例后,未正常抽取到近义属性 #5980

Open Jason916 opened 1 year ago

Jason916 commented 1 year ago

软件环境

- paddlepaddle:2.4.2
- paddlepaddle-gpu: 2.4.2.post116
- paddlenlp: 2.5.0

重复问题

错误描述

运行:
>>> schema = [{'评价维度': ['观点词', '情感倾向[正向,负向,未提及]']}]
>>> senta = Taskflow("sentiment_analysis", model="uie-senta-nano", schema=schema, task_path="./checkpoint/model_best")
>>> senta("这家点的房间很大,店家服务也很热情,就是房间隔音不好")

返回:
[{'评价维度': [{'text': '隔音'}, {'text': '价格'}]}]

稳定复现步骤 & 代码

1.使用官网提供的 label_studio.json 2.python label_studio.py --label_studio_file ./data/label_studio.json --synonym_file ./data/synonyms.txt --task_type ext --save_dir ./data --splits 0.8 0.1 0.1 --options "正向" "负向" "未提及" --negative_ratio 5 --is_shuffle True --seed 1000 3.python -u -m paddle.distributed.launch --gpus "0" finetune.py --train_path ./data/train.json --dev_path ./data/dev.json --save_dir ./checkpoint --learning_rate 1e-5 --batch_size 16 --max_seq_len 512 --num_epochs 3 --model uie-senta-base --seed 1000 --logging_steps 10 --valid_steps 100 --device gpu 4.使用官方实例运行 aspects = ["隔音", "价格"] schema2 = [{"评价维度": ["观点词", "情感倾向[正向,负向,未提及]"]}] senta = Taskflow("sentiment_analysis", schema=schema2, model="uie-senta-nano", task_path='./checkpoint/model_best', aspects=aspects) res = senta("这家点的房间很大,店家服务也很热情,就是房间隔音不好") print(res) 5.获取结果 [{'评价维度': [{'text': '隔音'}, {'text': '价格'}]}]

gongel commented 1 year ago

你好,请问无效是指?

Jason916 commented 1 year ago

你好,请问无效是指? 无效是指理论上按照官方的案例进行属性抽取,应该是可以得到近义属性的观点的,但是实际并没有