mrliangcb commented 1 year ago

请提出你的问题

你好！我在部署语义检索系统后端报错 : search_phase_execution_exception

软件环境: paddle-bfloat 0.1.7 paddle-pipelines 0.5.3 paddle2onnx 1.0.6 paddlefsl 1.1.0 paddlenlp 2.6.0rc0.post0 paddleocr 2.6.1.3 paddlepaddle-gpu 2.5.0.post116 onnx 1.14.0 onnxconverter-common 1.13.0 onnxruntime-gpu 1.15.1 paddle2onnx 1.0.6

按照该链接部署语义检索系统： https://github.com/PaddlePaddle/PaddleNLP/blob/develop/pipelines/examples/semantic-search/Install_windows.md

前面步骤都正常，文档数据写入 ANN 索引库正常，查看数据 : curl http://localhost:9200/dureader_robust_query_encoder/_search { "took": 1, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 1398, "relation": "eq" }, "max_score": 1.0, "hits": [ { "_index": "dureader_robust_query_encoder", "_id": "7b8b8e3eba66dfb34896562aee18ae78", "_score": 1.0, "_source": { "content": "爬行垫根据中间材料的不同可以分为:XPE爬行垫、EPE爬行垫、EVA爬行垫、PVC爬行垫;其中XPE爬行垫、EPE爬行垫都属于PE材料加保鲜膜复合而成,都是无异味的环保材料,但是XPE爬行垫是品质较好的爬行垫,韩国进口爬行垫都是这种爬行垫,而EPE爬行垫是国内厂家为了减低成本,使用EPE(珍珠棉)作为原料生产的一款爬行垫,该材料弹性差,易碎,开孔发泡防水性弱。EVA爬行垫、PVC爬行垫是用EVA或PVC作为原材料与保鲜膜复合的而成的爬行垫,或者把图案转印在原材料上,这两款爬行垫通常有异味,如果是图案转印的爬行垫,油墨外露容易脱落。当时我儿子爬的时候,我们也买了垫子,但是始终有味。最后就没用了,铺的就的薄毯子让他爬。", "content_type": "text", "__pydantic_initialised__": true, "name": "dev0.txt", "embedding": [……] ……

运行该命令以后 python -m streamlit run ui/webapp_semantic_search.py --server.port 8502，前端页面点击运行，报错如下:

Traceback (most recent call last): File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\paddle_pipelines-0.5.3-py3.9.egg\pipelines\pipelines\base.py", line 445, in run node_output, stream_id = self.graph.nodes[node_id]["component"]._dispatch_run(node_input) File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\paddle_pipelines-0.5.3-py3.9.egg\pipelines\nodes\base.py", line 120, in _dispatch_run return self._dispatch_run_general(self.run, kwargs) File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\paddle_pipelines-0.5.3-py3.9.egg\pipelines\nodes\base.py", line 164, in _dispatch_run_general output, stream = run_method(run_inputs, run_params) File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\paddle_pipelines-0.5.3-py3.9.egg\pipelines\nodes\retriever\base.py", line 132, in run output, stream = run_query_timed( File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\paddle_pipelines-0.5.3-py3.9.egg\pipelines\nodes\retriever\base.py", line 110, in wrapper ret = fn(args, kwargs) File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\paddle_pipelines-0.5.3-py3.9.egg\pipelines\nodes\retriever\base.py", line 185, in run_query documents = self.retrieve(query=query, filters=filters, top_k=top_k, index=index, headers=headers, kwargs) File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\paddle_pipelines-0.5.3-py3.9.egg\pipelines\nodes\retriever\dense.py", line 215, in retrieve documents = self.document_store.query_by_embedding( File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\paddle_pipelines-0.5.3-py3.9.egg\pipelines\document_stores\elasticsearch.py", line 1296, in query_by_embedding raise e File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\paddle_pipelines-0.5.3-py3.9.egg\pipelines\document_stores\elasticsearch.py", line 1286, in query_by_embedding result = self.client.search(index=index, body=body, request_timeout=300, headers=headers)["hits"]["hits"] File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\elasticsearch\client\utils.py", line 152, in _wrapped return func(args, params=params, headers=headers, **kwargs) File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\elasticsearch\client__init__.py", line 1657, in search return self.transport.perform_request( File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\elasticsearch\transport.py", line 415, in perform_request raise e File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\elasticsearch\transport.py", line 381, in perform_request status, headers_response, data = connection.perform_request( File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\elasticsearch\connection\http_urllib3.py", line 273, in perform_request self._raise_error(response.status, raw_data) File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\elasticsearch\connection\base.py", line 322, in _raise_error raise HTTP_EXCEPTIONS.get(status_code, TransportError)( elasticsearch.exceptions.RequestError: RequestError(400, 'search_phase_execution_exception', 'runtime error')

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\starlette\middleware\errors.py", line 162, in call await self.app(scope, receive, _send) File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\starlette\middleware\cors.py", line 83, in call await self.app(scope, receive, send) File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\starlette\middleware\exceptions.py", line 79, in call raise exc File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\starlette\middleware\exceptions.py", line 68, in call await self.app(scope, receive, sender) File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\fastapi\middleware\asyncexitstack.py", line 20, in call raise e File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\fastapi\middleware\asyncexitstack.py", line 17, in call await self.app(scope, receive, send) File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\starlette\routing.py", line 718, in call await route.handle(scope, receive, send) File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\starlette\routing.py", line 276, in handle await self.app(scope, receive, send) File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\starlette\routing.py", line 66, in app response = await func(request) File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\fastapi\routing.py", line 273, in app raw_response = await run_endpoint_function( File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\fastapi\routing.py", line 192, in run_endpoint_function return await run_in_threadpool(dependant.call, values) File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\starlette\concurrency.py", line 41, in run_in_threadpool return await anyio.to_thread.run_sync(func, args) File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\anyio\to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread return await future File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\anyio_backends_asyncio.py", line 807, in run result = context.run(func, args) File ".\rest_api\controller\search.py", line 101, in query result = _process_request(PIPELINE, request) File ".\rest_api\controller\search.py", line 222, in _process_request result = pipeline.run(query=request.query, params=params, debug=request.debug) File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\paddle_pipelines-0.5.3-py3.9.egg\pipelines\pipelines\base.py", line 448, in run raise Exception( Exception: Exception while running node Retriever with input {'root_node': 'Query', 'params': {'filters': {}, 'Retriever': {'top_k': 5, 'debug': False}, 'Ranker': {'top_k': 5}, 'Query': {'debug': False}}, 'query': '衡量酒水的价格的因素有哪些?', 'node_id': 'Retriever'}: RequestError(400, 'search_phase_execution_exception', 'runtime error'), full stack trace: Traceback (most recent call last): File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\paddle_pipelines-0.5.3-py3.9.egg\pipelines\pipelines\base.py", line 445, in run node_output, stream_id = self.graph.nodes[node_id]["component"]._dispatch_run(node_input) File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\paddle_pipelines-0.5.3-py3.9.egg\pipelines\nodes\base.py", line 120, in _dispatch_run return self._dispatch_run_general(self.run, kwargs) File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\paddle_pipelines-0.5.3-py3.9.egg\pipelines\nodes\base.py", line 164, in _dispatch_run_general output, stream = run_method(run_inputs, run_params) File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\paddle_pipelines-0.5.3-py3.9.egg\pipelines\nodes\retriever\base.py", line 132, in run output, stream = run_query_timed( File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\paddle_pipelines-0.5.3-py3.9.egg\pipelines\nodes\retriever\base.py", line 110, in wrapper ret = fn(args, kwargs) File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\paddle_pipelines-0.5.3-py3.9.egg\pipelines\nodes\retriever\base.py", line 185, in run_query documents = self.retrieve(query=query, filters=filters, top_k=top_k, index=index, headers=headers, kwargs) File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\paddle_pipelines-0.5.3-py3.9.egg\pipelines\nodes\retriever\dense.py", line 215, in retrieve documents = self.document_store.query_by_embedding( File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\paddle_pipelines-0.5.3-py3.9.egg\pipelines\document_stores\elasticsearch.py", line 1296, in query_by_embedding raise e File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\paddle_pipelines-0.5.3-py3.9.egg\pipelines\document_stores\elasticsearch.py", line 1286, in query_by_embedding result = self.client.search(index=index, body=body, request_timeout=300, headers=headers)["hits"]["hits"] File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\elasticsearch\client\utils.py", line 152, in _wrapped return func(args, params=params, headers=headers, kwargs) File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\elasticsearch\client__init__.py", line 1657, in search return self.transport.perform_request( File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\elasticsearch\transport.py", line 415, in perform_request raise e File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\elasticsearch\transport.py", line 381, in perform_request status, headers_response, data = connection.perform_request( File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\elasticsearch\connection\http_urllib3.py", line 273, in perform_request self._raise_error(response.status, raw_data) File "D:\anaconda3\envs\pipe-gpu2\lib\site-packages\elasticsearch\connection\base.py", line 322, in _raise_error raise HTTP_EXCEPTIONS.get(status_code, TransportError)( elasticsearch.exceptions.RequestError: RequestError(400, 'search_phase_execution_exception', 'runtime error')

烦请解决一下，谢谢！！

mrliangcb commented 1 year ago

curl -X POST -k http://localhost:8891/query -H 'Content-Type: application/json' -d '{"query":"衡量酒水的价格的因素有哪些?","params": {"Retriever": {"top_k": 5}, "Ranker":{"top_k": 5}}}' 输入这个命令请求后端也是报同样错误

mrliangcb commented 1 year ago

启动好restapi和webui之后，http://localhost:8502/显示 An error occurred during the request.

测试过程：问句是 “衡量酒水的价格的因素有哪些?”

打印了\PaddleNLP-develop\pipelines\rest_api\controller\search.py 中def _process_request(pipeline, request) -> Dict[str, Any]: 函数的变量，代码如下

def _process_request(pipeline, request) -> Dict[str, Any]: start_time = time.time() params = request.params or {} print("params：",params) print("request.query:",request.query)

format global, top-level filters (e.g. "params": {"filters": {"name": ["some"]}})

if "filters" in params.keys():
    params["filters"] = _format_filters(params["filters"])
# format targeted node filters (e.g. "params": {"Retriever": {"filters": {"value"}}})
for key in params.keys():
    if "filters" in params[key].keys():
        params[key]["filters"] = _format_filters(params[key]["filters"])
result = pipeline.run(query=request.query, params=params, debug=request.debug)

……

打印结果： params： {'filters': {}, 'Retriever': {'top_k': 30}, 'Ranker': {'top_k': 3}} request.query: 衡量酒水的价格的因素有哪些?

'filters'是空的字典{}

result = pipeline.run(query=request.query, params=params, debug=request.debug) 这个命令报错了，如上面的“elasticsearch.exceptions.RequestError: RequestError(400, 'search_phase_execution_exception', 'runtime error')”

mrliangcb commented 1 year ago

curl http://localhost:9200/_aliases?pretty=true 返回的是: { "label": { "aliases": {} }, "dureader_robust_query_encoder": { "aliases": {} } }

LoveRiverLi commented 1 year ago

有大佬解答吗

ColorfulDick commented 11 months ago

遇到同样的问题，不知老哥这边解决没

lc643476 commented 11 months ago

semantic_search.yaml配置文件能发出来看下吗，可能配置的不对。

caolongwei commented 10 months ago

version: '1.1.0'

components: # define all the building-blocks for Pipeline

name: DocumentStore type: ElasticsearchDocumentStore # consider using Milvus2DocumentStore or WeaviateDocumentStore for scaling to large number of documents params: host: localhost port: 9200 index: dureader_robust_base_encoder embedding_dim: 768
name: Retriever type: DensePassageRetriever params: document_store: DocumentStore # params can reference other components defined in the YAML top_k: 10 query_embedding_model: rocketqa-zh-base-query-encoder passage_embedding_model: rocketqa-zh-base-para-encoder embed_title: False
name: Ranker # custom-name for the component; helpful for visualization & debugging type: ErnieRanker # pipelines Class name for the component params: model_name_or_path: rocketqa-base-cross-encoder top_k: 3
name: TextFileConverter type: TextConverter
name: ImageFileConverter type: ImageToTextConverter
name: PDFFileConverter type: PDFToTextConverter
name: DocxFileConverter type: DocxToTextConverter
name: Preprocessor type: PreProcessor params: split_by: word split_length: 1000
name: FileTypeClassifier type: FileTypeClassifier

pipelines:

name: query
type: Query nodes:
- name: Retriever inputs: [Query]
- name: Ranker inputs: [Retriever]
name: indexing type: Indexing nodes:
- name: FileTypeClassifier inputs: [File]
- name: TextFileConverter inputs: [FileTypeClassifier.output_1]
- name: PDFFileConverter inputs: [FileTypeClassifier.output_2]
- name: DocxFileConverter inputs: [FileTypeClassifier.output_4]
- name: ImageFileConverter inputs: [FileTypeClassifier.output_6]
- name: Preprocessor inputs: [PDFFileConverter, TextFileConverter, DocxFileConverter, ImageFileConverter]
- name: Retriever inputs: [Preprocessor]
- name: DocumentStore inputs: [Retriever]

liudonglei commented 10 months ago

同样遇到这个问题，在这个教程https://github.com/PaddlePaddle/PaddleNLP/tree/develop/pipelines/examples/unsupervised-question-answering 里的3.4.4步骤

mmyyly commented 5 months ago

遇到相同的问题，希望有大佬帮忙看看，困扰好几天了，难受

PaddlePaddle / PaddleNLP

[Question]: 部署语义检索系统后端报错 : search_phase_execution_exception #6535

请提出你的问题

format global, top-level filters (e.g. "params": {"filters": {"name": ["some"]}})