PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
12.19k stars 2.95k forks source link

[Question]: 端到端语义检索系统,基于DuReader-Robust数据集搭建语义检索系统的代码示例 执行失败 #4922

Closed huangbo-bo closed 5 months ago

huangbo-bo commented 1 year ago

请提出你的问题

环境 系统:CentOS Linux release 7.9.2009 (Core) python:3.9.16 pip:23.0.1 paddlepaddle:2.4.1 paddlenlp:2.5.1 paddle-pipelines:0.4.0

执行命令 python examples/semantic-search/semantic_search_example.py --device cpu --search_engine faiss

报错 Traceback (most recent call last): File "/root/PaddleNLP/pipelines/examples/semantic-search/semantic_search_example.py", line 187, in semantic_search_tutorial() File "/root/PaddleNLP/pipelines/examples/semantic-search/semantic_search_example.py", line 166, in semantic_search_tutorial retriever = get_faiss_retriever(use_gpu) File "/root/PaddleNLP/pipelines/examples/semantic-search/semantic_search_example.py", line 93, in get_faiss_retriever document_store.update_embeddings(retriever) File "/usr/local/python39/lib/python3.9/site-packages/pipelines/document_stores/faiss.py", line 408, in update_embeddings for document_batch in batched_documents: File "/usr/local/python39/lib/python3.9/site-packages/pipelines/document_stores/base.py", line 681, in get_batches_from_generator x = tuple(islice(it, n)) File "/usr/local/python39/lib/python3.9/site-packages/pipelines/document_stores/sql.py", line 358, in _query for i, row in enumerate(documents_query, start=1): File "/usr/local/python39/lib/python3.9/site-packages/pipelines/document_stores/sql.py", line 870, in _windowed_query for whereclause in self._column_windows(q.session, column, windowsize): File "/usr/local/python39/lib/python3.9/site-packages/pipelines/document_stores/sql.py", line 857, in _column_windows intervals = [id for id, in q] File "/usr/local/python39/lib/python3.9/site-packages/pipelines/document_stores/sql.py", line 857, in intervals = [id for id, in q] File "/usr/local/python39/lib/python3.9/site-packages/sqlalchemy/orm/query.py", line 2901, in iter result = self._iter() File "/usr/local/python39/lib/python3.9/site-packages/sqlalchemy/orm/query.py", line 2916, in _iter result = self.session.execute( File "/usr/local/python39/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 1714, in execute result = conn._execute_20(statement, params or {}, execution_options) File "/usr/local/python39/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1705, in _execute_20 return meth(self, args_10style, kwargs_10style, execution_options) File "/usr/local/python39/lib/python3.9/site-packages/sqlalchemy/sql/elements.py", line 334, in _execute_on_connection return connection._execute_clauseelement( File "/usr/local/python39/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1572, in _execute_clauseelement ret = self._execute_context( File "/usr/local/python39/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1943, in _execute_context self._handle_dbapi_exception( File "/usr/local/python39/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2124, in _handle_dbapiexception util.raise( File "/usr/local/python39/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 211, in raise_ raise exception File "/usr/local/python39/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1900, in _execute_context self.dialect.do_execute( File "/usr/local/python39/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute cursor.execute(statement, parameters) sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) near "(": syntax error [SQL: SELECT anon_1.document_id AS anon_1_document_id FROM (SELECT document.id AS document_id, row_number() OVER (ORDER BY document.id) AS rownum FROM document) AS anon_1 WHERE rownum % 10000=1] (Background on this error at: https://sqlalche.me/e/14/e3q8)

w5688414 commented 1 year ago

我本地未能复现您的问题,请问能更详细一点吗?或者开一个新的虚拟环境尝试一下。

.....
Query: 亚马逊河流的介绍

{   'content': '亚马逊河大部分在巴西境内.巴西人自豪地称之为“河海”.亚马逊河是拉丁美洲人民的骄傲.亚马逊河滋润着南美洲的广袤土地,孕育了世界最大的热带雨林,使这一片地域成为世界上公认的最神秘的“生命王国”.亚马逊河 '
               '维基百科,自由的百科全书 跳转到:导航,搜索 亚马孙河 全长 6,296 km 源头海拔高度 5,597 m 平均流量 '
               '219,000 m³/s 流域面积 6,915,000 km² 源头 奈瓦多·米斯米峰 出海口 大西洋 '
               '流经国家 '
               '亚马逊河(大陆官方译名亚马孙河)位于南美洲,虽然长度在世界上处于第二位,但流量是世界上最大的,比其他三条大河:尼罗河、密西西比河和长江的流量总和还要大,亚马逊河的流域面积也是世界上最大的.亚马逊河向大西洋排放的水量达到了每秒18万4千立方米,相当于全世界所有河流向海洋排放的淡水总量的五分之一,从亚马逊河口直到肉眼看不到海岸的地方,海洋中的',
    'name': 'dev27.txt'}

这是我的环境

aiohttp==3.8.4
aiosignal==1.3.1
altair==4.2.2
anyio==3.6.2
astor @ file:///home/conda/feedstock_root/build_artifacts/astor_1593610464257/work
async-timeout==4.0.2
attrdict==2.0.1
attrs==22.2.0
Babel==2.11.0
bce-python-sdk==0.8.79
beautifulsoup4==4.11.2
blinker==1.5
brotlipy @ file:///home/conda/feedstock_root/build_artifacts/brotlipy_1648854164373/work
cachetools==5.3.0
certifi==2022.12.7
cffi @ file:///home/conda/feedstock_root/build_artifacts/cffi_1636046055389/work
charset-normalizer @ file:///home/conda/feedstock_root/build_artifacts/charset-normalizer_1661170624537/work
click==8.0.0
colorama==0.4.6
colorlog==6.7.0
contourpy==1.0.7
cryptography @ file:///home/conda/feedstock_root/build_artifacts/cryptography_1652967134882/work
cssselect==1.2.0
cssutils==2.6.0
cycler==0.11.0
Cython==0.29.33
datasets==2.9.0
decorator @ file:///home/conda/feedstock_root/build_artifacts/decorator_1641555617451/work
dill==0.3.4
elasticsearch==7.10.0
entrypoints==0.4
et-xmlfile==1.1.0
faiss-cpu==1.7.3
fastapi==0.92.0
filelock==3.9.0
fire==0.5.0
Flask==2.2.3
Flask-Babel==2.0.0
fonttools==4.38.0
frozenlist==1.3.3
fsspec==2023.1.0
future==0.18.3
gitdb==4.0.10
GitPython==3.1.31
greenlet==2.0.2
grpcio==1.47.2
grpcio-tools==1.47.2
h11==0.14.0
htbuilder==0.6.1
huggingface-hub==0.12.1
idna @ file:///home/conda/feedstock_root/build_artifacts/idna_1663625384323/work
imageio==2.25.1
imgaug==0.4.0
importlib-metadata==6.0.0
importlib-resources==5.12.0
itsdangerous==2.1.2
jieba==0.42.1
Jinja2==3.1.2
joblib==1.2.0
jsonschema==4.17.3
kiwisolver==1.4.4
langdetect==1.0.9
llvmlite==0.39.1
lmdb==1.4.0
lxml==4.9.2
Markdown==3.4.1
markdown-it-py==2.1.0
MarkupSafe==2.1.2
matplotlib==3.7.0
mdurl==0.1.2
mmh3==3.0.0
more-itertools==9.0.0
multidict==6.0.4
multiprocess==0.70.12.2
networkx==3.0
nltk==3.8.1
numba==0.56.4
numpy @ file:///home/conda/feedstock_root/build_artifacts/numpy_1651020388495/work
olefile @ file:///home/conda/feedstock_root/build_artifacts/olefile_1602866521163/work
opencv-contrib-python==4.6.0.66
opencv-contrib-python-headless==4.7.0.68
opencv-python==4.6.0.66
openpyxl==3.1.1
opt-einsum @ file:///home/conda/feedstock_root/build_artifacts/opt_einsum_1617859230218/work
packaging==23.0
paddle-bfloat @ file:///package/paddle_bfloat-0.1.7.tar.gz
paddle-pipelines==0.4.0
paddle2onnx==1.0.5
paddlefsl==1.1.0
paddlenlp==2.5.1
paddleocr==2.6.1.3
paddlepaddle-gpu @ file:///package/paddlepaddle_gpu-2.4.1.post112-cp39-cp39-linux_x86_64.whl
pandas==1.5.3
pdf2docx==0.5.6
pdfminer.six==20221105
pdfplumber==0.8.0
Pillow==9.4.0
premailer==3.10.0
protobuf==3.18.0
pyarrow==11.0.0
pyclipper==1.3.0.post4
pycparser @ file:///home/conda/feedstock_root/build_artifacts/pycparser_1636257122734/work
pycryptodome==3.17
pydantic==1.10.5
pydeck==0.8.0
Pygments==2.14.0
pymilvus==2.2.2
Pympler==1.0.1
PyMuPDF==1.20.2
pyOpenSSL @ file:///home/conda/feedstock_root/build_artifacts/pyopenssl_1663846997386/work
pyparsing==3.0.9
pyrsistent==0.19.3
PySocks @ file:///home/conda/feedstock_root/build_artifacts/pysocks_1661604839144/work
python-dateutil==2.8.2
python-docx==0.8.11
python-multipart==0.0.5
pytz==2022.7.1
pytz-deprecation-shim==0.1.0.post0
PyWavelets==1.4.1
PyYAML==6.0
rapidfuzz==2.13.7
regex==2022.10.31
requests @ file:///home/conda/feedstock_root/build_artifacts/requests_1673863902341/work
responses==0.18.0
rich==13.3.1
scikit-image==0.19.3
scikit-learn==1.2.1
scipy==1.10.1
semver==2.13.0
sentencepiece==0.1.97
seqeval==1.2.2
shapely==2.0.1
six @ file:///home/conda/feedstock_root/build_artifacts/six_1620240208055/work
smmap==5.0.0
sniffio==1.3.0
soupsieve==2.4
SQLAlchemy==1.4.46
SQLAlchemy-Utils==0.40.0
st-annotated-text==3.0.0
starlette==0.25.0
streamlit==1.9.0
termcolor==2.2.0
threadpoolctl==3.1.0
tifffile==2023.2.3
toml==0.10.2
toolz==0.12.0
tornado==6.2
tqdm==4.64.1
typer==0.7.0
typing_extensions==4.5.0
tzdata==2022.7
tzlocal==4.2
ujson==5.4.0
urllib3 @ file:///home/conda/feedstock_root/build_artifacts/urllib3_1673452138552/work
uvicorn==0.20.0
validators==0.20.0
visualdl==2.4.2
Wand==0.6.11
watchdog==2.2.1
Werkzeug==2.2.3
xxhash==3.2.0
yarl==1.8.2
zipp==3.14.0
suntao2015005848 commented 1 year ago

遇到同样的问题 #6233