PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
12.11k stars 2.94k forks source link

[Bug]: Taskflow("text_similarity") 对象多次调用抛出异常 #6473

Closed zhupeifox closed 2 months ago

zhupeifox commented 1 year ago

软件环境

- paddlepaddle:2.5.0
- paddlenlp: 2.5.2

重复问题

错误描述

text_similarity同样的内容多次调用后报错
RuntimeError: (PreconditionNotMet) The meta data must be valid when call the mutable data function.
  [Hint: Expected valid() == true, but received valid():0 != true:1.] (at ..\paddle\phi\core\dense_tensor.cc:122)
  [operator < fill_constant > error]

稳定复现步骤 & 代码

from paddlenlp import Taskflow D:\xxx.venv\lib\site-packages_distutils_hack__init.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") similarity = Taskflow( ... "text_similarity", model="rocketqa-zh-dureader-cross-encoder" ... ) [2023-07-24 14:02:04,269] [ INFO] - Already cached C:\xxx.paddlenlp\models\rocketqa-zh-dureader-cross-encoder\rocketqa-zh-dureader-vocab.txt [2023-07-24 14:02:04,280] [ INFO] - tokenizer config file saved in C:\xxx.paddlenlp\models\rocketqa-zh-dureader-cross-encoder\tok[2023-07-24 14:02:04,282] [ INFO] - Special tokens file saved in C:\xxx.paddlenlp\models\rocketqa-zh-dureader-cross-encoder\special_tokens_map.json similarity([["XXXXX关于XXXXXXXXXXX的通知", "XXXXXXXXXXXXXXXX"]]) [{'text1': 'XXXXX关于XXXXXXXXXXX的通知', 'text2': 'XXXXXXXXXXXXXXXX', 'similarity': 0.5412642955780029}] similarity([["XXXXX关于XXXXXXXXXXX的通知", "XXXXXXXXXXXXXXXX"]]) [{'text1': 'XXXXX关于XXXXXXXXXXX的通知', 'text2': 'XXXXXXXXXXXXXXXX', 'similarity': 0.4708636403083801}] similarity([["XXXXX关于XXXXXXXXXXX的通知", "XXXXXXXXXXXXXXXX"]]) Traceback (most recent call last): File "", line 1, in File "D:\Project\electronicrecordslibrary\TextTranslator.venv\lib\site-packages\paddlenlp\taskflow\taskflow.py", line 850, in call__
results = self.task_instance(inputs) File "D:\Project\electronicrecordslibrary\TextTranslator.venv\lib\site-packages\paddlenlp\taskflow\task.py", line 516, in call outputs = self._run_model(inputs) File "D:\Project\electronicrecordslibrary\TextTranslator.venv\lib\site-packages\paddlenlp\taskflow\text_similarity.py", line 279, in _run_model self.predictor.run() RuntimeError: (PreconditionNotMet) The meta data must be valid when call the mutable data function. [Hint: Expected valid() == true, but received valid():0 != true:1.] (at ..\paddle\phi\core\dense_tensor.cc:122) [operator < fill_constant > error]

zhupeifox commented 1 year ago

`经过测试可能是模型的问题,采用"rocketqa-zh-dureader-cross-encoder"与"rocketqa-base-cross-encoder"会出现这个问题,采用“simbert-base-chinese”会出现一下错误```

__similarity([["XXXXX关于XXXXXXXXXXX的通知", "XXXXXXXXXXXXXXXX"]]) File "", line 1, in results = self.task_instance(inputs) outputs = self._run_model(inputs) File "D:\Project\electronicrecordslibrary\TextTranslator.venv\lib\site-packages\paddlenlp\taskflow\text_similarity.py", line 293, in _run_model self.predictor.run() ValueError: (InvalidArgument) Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 512, but got -4988613369508970118. Please check input value. [Hint: Expected ids[i] >= 0, but received ids[i]:-4988613369508970118 < 0:0.] (at ..\paddle\phi\kernels\cpu\embedding_kernel.cc:76) [operator < lookup_table_v2 > error]```

采用"rocketqa-medium-cross-encoder"模型可正常使用

w5688414 commented 1 year ago

发一下复现代码

zhupeifox commented 1 year ago
from paddlenlp import Taskflow
__similarity = Taskflow(
... "text_similarity", model="rocketqa-zh-dureader-cross-encoder"
... )
__similarity([["XXXXX关于XXXXXXXXXXX的通知", "XXXXXXXXXXXXXXXX"]])
__similarity([["XXXXX关于XXXXXXXXXXX的通知", "XXXXXXXXXXXXXXXX"]])
__similarity([["XXXXX关于XXXXXXXXXXX的通知", "XXXXXXXXXXXXXXXX"]])
  1. 采用rocketqa-zh-dureader-cross-encoder与rocketqa-base-cross-encoder前两次调用结果都是正常输出,第三次抛出issue所描述的错误
  2. 采用simbert-base-chinese会出现第一条回复出现的错误
  3. 采用模型rocketqa-medium-cross-encoder可正常使用
  4. 测试环境为windows+cpu通过pip部署安装
chenhongwu127 commented 1 year ago

我也遇到这样的问题,现在还没有解决,求教各位大神~