PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
11.97k stars 2.91k forks source link

Pipelines语义索引 ImagetoTextConverter缺失问题和语义索引docdir路径问题 #2761

Closed w5688414 closed 1 year ago

w5688414 commented 2 years ago

欢迎您反馈PaddleNLP使用问题,非常感谢您对PaddleNLP的贡献! 在留下您的问题时,辛苦您同步提供如下信息:

image

昨天运行这一步的时候,会出现图中错误,我注释掉图中那两处后,才执行通过

image

再就是UI搭建起来之后,运行会有这个错误

image

我将doc_dir路径修改后,才能运行搜索出结果,怀疑是写入 ANN 索引库 数据是不是有问题

rest_api/pipeline/semantic_search.yaml

bruce0210 commented 2 years ago

环境配置为: 2xCPU 16GiB内存 centos7.5 anaconda->python3.8.5 paddlepaddle==2.3.0(cpu) paddleNLP==2.3.4

即:按照文档中步骤依次执行到该步骤时,出现下图中报错信息: image

BTW: 在执行时会先抛出如下提示: image 参考:https://github.com/jalan/pdftotext;做如下操作,需先完成pdftotext的安装; wget --no-check-certificate https://dl.xpdfreader.com/xpdf-tools-linux-4.04.tar.gz tar -xvf xpdf-tools-linux-4.04.tar.gz sudo cp xpdf-tools-linux-4.04/bin64/pdftotext /usr/local/bin sudo yum install gcc-c++ pkgconfig poppler-cpp-devel python3-devel pip install pdftotext

bruce0210 commented 2 years ago

关于下图出现的问题: image 我修改了:rest_api/pipeline/semantic_search.yaml中的:index: baike_cities 并注释掉了:

- name: ImageFileConverter

type: ImageToTextConverter

详见下图中位置: image 然后依次重新启动服务,可正常输出查询结果。

w5688414 commented 2 years ago

image

pip install opencv-python==4.5.5.64
pip install opencv-contrib-python-headless==4.2.0.32
github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。

github-actions[bot] commented 1 year ago

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。