PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
12.16k stars 2.94k forks source link

用uie-base-en模型做实体识别,会把英文单词割开 #5247

Open MachineSheep opened 1 year ago

MachineSheep commented 1 year ago

请提出你的问题

我使用uie-base-en模型做实体识别,会出现很多把英文单词割开的情况,得到错误的结果,这是为什么呢?

MachineSheep commented 1 year ago

比如I like it这句话,识别结果可能是ke it, uie-base-en是以英文字符做token的?

LiuChiachi commented 1 year ago

会不会有些单词不在uie-base-en的词表中,所以被拆开了

MachineSheep commented 1 year ago

会不会有些单词不在uie-base-en的词表中,所以被拆开了

应该不是,模型识别把the world 中的e world认为成一种实体,the这个单词应该在词表中吧