PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
11.98k stars 2.92k forks source link

[Question]: 服务器部署语义检索系统,es报错 #4988

Open fzg0202 opened 1 year ago

fzg0202 commented 1 year ago

请提出你的问题

es镜像报错: {"@timestamp":"2023-02-24T19:50:11.945Z", "log.level":"ERROR", "message":"exception during geoip databases update", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[5f3fc13b7220][generic][T#3]","log.logger":"org.elasticsearch.ingest.geoip.GeoIpDownloader","elasticsearch.cluster.uuid":"9Zad9o2zTOW7jSKaAqkk5g","elasticsearch.node.id":"x-QcZjn6QSKjyMqljFcQyw","elasticsearch.node.name":"5f3fc13b7220","elasticsearch.cluster.name":"docker-cluster","error.type":"java.net.UnknownHostException","error.message":"geoip.elastic.co","error.stack_trace":"java.net.UnknownHostException: geoip.elastic.co\n\tat java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:564)\n\tat java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327)\n\tat java.base/java.net.Socket.connect(Socket.java:633)\n\tat java.base/sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:304)\n\tat java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:178)\n\tat java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:498)\n\tat java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:603)\n\tat java.base/sun.net.www.protocol.https.HttpsClient.(HttpsClient.java:264)\n\tat java.base/sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:378)\n\tat java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:189)\n\tat java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1242)\n\tat java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1128)\n\tat java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:175)\n\tat java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1665)\n\tat java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1589)\n\tat java.base/java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:529)\n\tat java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:308)\n\tat org.elasticsearch.ingest.geoip.HttpClient.lambda$get$0(HttpClient.java:46)\n\tat java.base/java.security.AccessController.doPrivileged(AccessController.java:569)\n\tat org.elasticsearch.ingest.geoip.HttpClient.doPrivileged(HttpClient.java:88)\n\tat org.elasticsearch.ingest.geoip.HttpClient.get(HttpClient.java:40)\n\tat org.elasticsearch.ingest.geoip.HttpClient.getBytes(HttpClient.java:36)\n\tat org.elasticsearch.ingest.geoip.GeoIpDownloader.fetchDatabasesOverview(GeoIpDownloader.java:155)\n\tat org.elasticsearch.ingest.geoip.GeoIpDownloader.updateDatabases(GeoIpDownloader.java:143)\n\tat org.elasticsearch.ingest.geoip.GeoIpDownloader.runDownloader(GeoIpDownloader.java:274)\n\tat org.elasticsearch.ingest.geoip.GeoIpDownloaderTaskExecutor.nodeOperation(GeoIpDownloaderTaskExecutor.java:102)\n\tat org.elasticsearch.ingest.geoip.GeoIpDownloaderTaskExecutor.nodeOperation(GeoIpDownloaderTaskExecutor.java:48)\n\tat org.elasticsearch.server@8.3.3/org.elasticsearch.persistent.NodePersistentTasksExecutor$1.doRun(NodePersistentTasksExecutor.java:42)\n\tat org.elasticsearch.server@8.3.3/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:769)\n\tat org.elasticsearch.server@8.3.3/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\n\tat java.base/java.lang.Thread.run(Thread.java:833)\n"}

paddle镜像报错: ConnectionError: Initial connection to Elasticsearch failed. Make sure you run an Elasticsearch instance at [{'host': 'localhost', 'port': '9200'}] and that it has finished the initial ramp up (can take > 30s).

w5688414 commented 1 year ago

请检查一下es是否正常,推荐使用docker启动:

docker network create elastic
docker pull docker.elastic.co/elasticsearch/elasticsearch:8.3.3
docker run \
      -d \
      --name es02 \
      --net elastic \
      -p 9200:9200 \
      -e discovery.type=single-node \
      -e ES_JAVA_OPTS="-Xms256m -Xmx256m"\
      -e xpack.security.enabled=false \
      -e cluster.routing.allocation.disk.threshold_enabled=false \
      -it \
      docker.elastic.co/elasticsearch/elasticsearch:8.3.3

curl http://localhost:9200/_aliases?pretty=true

https://github.com/PaddlePaddle/PaddleNLP/tree/develop/pipelines/examples/semantic-search

fzg0202 commented 1 year ago

@w5688414 根据那个网址访问es,返回结果是 { "dureader_robust_query_encoder" : { "aliases" : { } }, "label" : { "aliases" : { } } } 这个算是启动成功了吧

w5688414 commented 1 year ago

@w5688414 根据那个网址访问es,返回结果是 { "dureader_robust_query_encoder" : { "aliases" : { } }, "label" : { "aliases" : { } } } 这个算是启动成功了吧

启动成功了

fzg0202 commented 1 year ago

@w5688414 请问如果访问http://localhost:9200/_aliases?pretty=true这个网址,返回一个空字典,是什么原因呢

w5688414 commented 1 year ago

@w5688414 请问如果访问http://localhost:9200/_aliases?pretty=true这个网址,返回一个空字典,是什么原因呢

说明里面没数据

fzg0202 commented 1 year ago

@w5688414 请问es正常启动(通过上面的网址验证),但是显示连接报错。是哪里出问题了呢

fzg0202 commented 1 year ago

@w5688414 错误信息是: ConnectionError: Initial connection to Elasticsearch failed. Make sure you run an Elasticsearch instance at [{'host': 'localhost', 'port': '9200'}] and that it has finished the initial ramp up (can take > 30s).

w5688414 commented 1 year ago

@w5688414 错误信息是: ConnectionError: Initial connection to Elasticsearch failed. Make sure you run an Elasticsearch instance at [{'host': 'localhost', 'port': '9200'}] and that it has finished the initial ramp up (can take > 30s).

如果es是正常的,请确保您当前pipelines语义检索的环境能够访问localhost?

fzg0202 commented 1 year ago

@w5688414 locahost无法访问。像这种情况,是不是需要我去paddle那个容器里面修改/etc/hosts文件里面的ip映射呢

fzg0202 commented 1 year ago

@w5688414 我按照端到端语义检索系统的配置流程进行验证,发现在3.4.3启动 RestAPI 模型服务 这一步无法通过验证。验证命令是: curl -X POST -k http://localhost:8891/query -H 'Content-Type: application/json' -d '{"query": "衡量酒水的价格的因素有哪些?","params": {"Retriever": {"top_k": 5}, "Ranker":{"top_k": 5}}}'

错误信息是: Failed to connect to localhost port 8891: Connection refused 请问这个是哪里的问题呢

fzg0202 commented 1 year ago

@w5688414 您好,我现在cpu版本的课可以用了。但是换成GPU版本的在运行PaddleNLP/pipelines/run_serve.sh时会报错: Segmentation fault (core dumped)。请问这个是什么原因呢

fzg0202 commented 1 year ago

@w5688414 您好。我这边目前使用GPU版本的可以正常使用。但是再使用GPU版本的时候,通过docker镜像快速部署。直接运行create_index.sh的时候,目前我使用的镜像是registry.baidubce.com/paddlepaddle/paddlenlp:2.4.0-gpu-cuda11.2-cudnn8),我个人电脑的cuda是11.2,cudnn是8.1.0.但是还是报错了,报错信息如下: W0228 16:48:30.899076 396 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.4, Runtime API Version: 11.2 W0228 16:48:30.910549 396 gpu_resources.cc:91] device: 0, cuDNN Version: 8.1. W0228 16:48:32.975342 396 dynamic_loader.cc:305] The third-party dynamic library (libcuda.so) that Paddle depends on is not configured correctly. (error code is /usr/lib/x86_64-linux-gnu/libcuda.so: file too short) Suggestions:

  1. Check if the third-party dynamic library (e.g. CUDA, CUDNN) is installed correctly and its version is matched with paddlepaddle you installed.
  2. Configure third-party dynamic library environment variables as follows:
    • Linux: set LD_LIBRARY_PATH by export LD_LIBRARY_PATH=...
    • Windows: set PATH by `set PATH=XXX; Segmentation fault (core dumped) 希望您能告诉这个镜像对应的具体cuda和cudnn版本号
w5688414 commented 1 year ago

@w5688414 您好。我这边目前使用GPU版本的可以正常使用。但是再使用GPU版本的时候,通过docker镜像快速部署。直接运行create_index.sh的时候,目前我使用的镜像是registry.baidubce.com/paddlepaddle/paddlenlp:2.4.0-gpu-cuda11.2-cudnn8),我个人电脑的cuda是11.2,cudnn是8.1.0.但是还是报错了,报错信息如下: W0228 16:48:30.899076 396 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.4, Runtime API Version: 11.2 W0228 16:48:30.910549 396 gpu_resources.cc:91] device: 0, cuDNN Version: 8.1. W0228 16:48:32.975342 396 dynamic_loader.cc:305] The third-party dynamic library (libcuda.so) that Paddle depends on is not configured correctly. (error code is /usr/lib/x86_64-linux-gnu/libcuda.so: file too short) Suggestions:

  1. Check if the third-party dynamic library (e.g. CUDA, CUDNN) is installed correctly and its version is matched with paddlepaddle you installed.
  2. Configure third-party dynamic library environment variables as follows:
  • Linux: set LD_LIBRARY_PATH by export LD_LIBRARY_PATH=...
  • Windows: set PATH by `set PATH=XXX; Segmentation fault (core dumped) 希望您能告诉这个镜像对应的具体cuda和cudnn版本号

检查一下paddle能否正常运行。

安装完成后您可以使用 python 进入 python 解释器,输入import paddle ,再输入 paddle.utils.run_check()

如果出现PaddlePaddle is installed successfully!,说明您已成功安装。

fzg0202 commented 1 year ago

@w5688414 您好,按照您说的。出现了错误。错误信息如下:

No stack trace in paddle, may be caused by external reasons.

----------------------
Error Message Summary:
----------------------
FatalError: `Segmentation fault` is detected by the operating system.
  [TimeInfo: *** Aborted at 1677810217 (unix time) try "date -d @1677810217" if you are using GNU date ***]
  [SignalInfo: *** SIGSEGV (@0x0) received by PID 449 (TID 0x7f6374f5f740) from PID 0 ***]

Segmentation fault (core dumped)

这意味着是paddle版本的原因吗?