PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
11.98k stars 2.92k forks source link

ERNIE-Doc模型关于信息抽取抽取的微调后续会支持开放吗? #3420

Closed dingidng closed 1 year ago

dingidng commented 1 year ago

问题描述

看到ERNIE-Doc在篇章级数据集表现优秀,

看了文档:https://github.com/PaddlePaddle/ERNIE/blob/repro/ernie-doc/README_zh.md#%E4%BF%A1%E6%81%AF%E6%8A%BD%E5%8F%96

https://github.com/PaddlePaddle/PaddleNLP/blob/develop/model_zoo/ernie-doc/README.md

目前开放了开源了中英文分类任务以及中文阅读理解任务的微调代码 sh script/run_imdb.sh # 英文分类任务 sh script/run_iflytek.sh # 中文分类任务 sh script/run_dureader.sh # 中文阅读理解任务

image

而且目前官网主推的UIE暂时也支持512长度文本,希望可以看看ERNIE-Doc微调开源后在不同数据集的鲁棒性。

wawltor commented 1 year ago

目前我们不会ERNIE-DOC来做信息抽取,UIE可以支持的2048的长度,只是从效果上而言UIE对于超长文本的抽取能力效果并不好