[Question]: 服务器部署语义检索系统,es报错 #4988

Open fzg0202 opened 1 year ago

fzg0202 commented 1 year ago


es镜像报错: {"@timestamp":"2023-02-24T19:50:11.945Z", "log.level":"ERROR", "message":"exception during geoip databases update", "ecs.version": "1.2.0","":"ES_ECS","event.dataset":"elasticsearch.server","":"elasticsearch[5f3fc13b7220][generic][T#3]","log.logger":"org.elasticsearch.ingest.geoip.GeoIpDownloader","elasticsearch.cluster.uuid":"9Zad9o2zTOW7jSKaAqkk5g","":"x-QcZjn6QSKjyMqljFcQyw","":"5f3fc13b7220","":"docker-cluster","error.type":"","error.message":"","error.stack_trace":"\n\tat java.base/\n\tat java.base/\n\tat java.base/\n\tat java.base/\n\tat java.base/\n\tat java.base/\n\tat java.base/\n\tat java.base/\n\tat java.base/\n\tat java.base/\n\tat java.base/\n\tat java.base/\n\tat java.base/\n\tat java.base/\n\tat java.base/\n\tat java.base/\n\tat java.base/\n\tat org.elasticsearch.ingest.geoip.HttpClient.lambda$get$0(\n\tat java.base/\n\tat org.elasticsearch.ingest.geoip.HttpClient.doPrivileged(\n\tat org.elasticsearch.ingest.geoip.HttpClient.get(\n\tat org.elasticsearch.ingest.geoip.HttpClient.getBytes(\n\tat org.elasticsearch.ingest.geoip.GeoIpDownloader.fetchDatabasesOverview(\n\tat org.elasticsearch.ingest.geoip.GeoIpDownloader.updateDatabases(\n\tat org.elasticsearch.ingest.geoip.GeoIpDownloader.runDownloader(\n\tat org.elasticsearch.ingest.geoip.GeoIpDownloaderTaskExecutor.nodeOperation(\n\tat org.elasticsearch.ingest.geoip.GeoIpDownloaderTaskExecutor.nodeOperation(\n\tat org.elasticsearch.server@8.3.3/org.elasticsearch.persistent.NodePersistentTasksExecutor$1.doRun(\n\tat org.elasticsearch.server@8.3.3/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(\n\tat org.elasticsearch.server@8.3.3/\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$\n\tat java.base/\n"}

paddle镜像报错: ConnectionError: Initial connection to Elasticsearch failed. Make sure you run an Elasticsearch instance at [{'host': 'localhost', 'port': '9200'}] and that it has finished the initial ramp up (can take > 30s).

w5688414 commented 1 year ago


docker network create elastic
docker pull
docker run \
      -d \
      --name es02 \
      --net elastic \
      -p 9200:9200 \
      -e discovery.type=single-node \
      -e ES_JAVA_OPTS="-Xms256m -Xmx256m"\
      -e \
      -e cluster.routing.allocation.disk.threshold_enabled=false \
      -it \

curl http://localhost:9200/_aliases?pretty=true

fzg0202 commented 1 year ago

@w5688414 根据那个网址访问es,返回结果是 { "dureader_robust_query_encoder" : { "aliases" : { } }, "label" : { "aliases" : { } } } 这个算是启动成功了吧

w5688414 commented 1 year ago

fzg0202 commented 1 year ago

@w5688414 请问如果访问http://localhost:9200/_aliases?pretty=true这个网址,返回一个空字典,是什么原因呢

w5688414 commented 1 year ago

fzg0202 commented 1 year ago

@w5688414 请问es正常启动(通过上面的网址验证),但是显示连接报错。是哪里出问题了呢

fzg0202 commented 1 year ago

@w5688414 错误信息是: ConnectionError: Initial connection to Elasticsearch failed. Make sure you run an Elasticsearch instance at [{'host': 'localhost', 'port': '9200'}] and that it has finished the initial ramp up (can take > 30s).

w5688414 commented 1 year ago

fzg0202 commented 1 year ago

@w5688414 locahost无法访问。像这种情况,是不是需要我去paddle那个容器里面修改/etc/hosts文件里面的ip映射呢

fzg0202 commented 1 year ago

@w5688414 我按照端到端语义检索系统的配置流程进行验证,发现在3.4.3启动 RestAPI 模型服务 这一步无法通过验证。验证命令是: curl -X POST -k http://localhost:8891/query -H 'Content-Type: application/json' -d '{"query": "衡量酒水的价格的因素有哪些?","params": {"Retriever": {"top_k": 5}, "Ranker":{"top_k": 5}}}'

错误信息是: Failed to connect to localhost port 8891: Connection refused 请问这个是哪里的问题呢

fzg0202 commented 1 year ago

@w5688414 您好,我现在cpu版本的课可以用了。但是换成GPU版本的在运行PaddleNLP/pipelines/run_serve.sh时会报错: Segmentation fault (core dumped)。请问这个是什么原因呢

fzg0202 commented 1 year ago

@w5688414 您好。我这边目前使用GPU版本的可以正常使用。但是再使用GPU版本的时候,通过docker镜像快速部署。直接运行create_index.sh的时候,目前我使用的镜像是,我个人电脑的cuda是11.2,cudnn是8.1.0.但是还是报错了,报错信息如下: W0228 16:48:30.899076 396] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.4, Runtime API Version: 11.2 W0228 16:48:30.910549 396] device: 0, cuDNN Version: 8.1. W0228 16:48:32.975342 396] The third-party dynamic library ( that Paddle depends on is not configured correctly. (error code is /usr/lib/x86_64-linux-gnu/ file too short) Suggestions:

  1. Check if the third-party dynamic library (e.g. CUDA, CUDNN) is installed correctly and its version is matched with paddlepaddle you installed.
  2. Configure third-party dynamic library environment variables as follows:
    • Linux: set LD_LIBRARY_PATH by export LD_LIBRARY_PATH=...
    • Windows: set PATH by `set PATH=XXX; Segmentation fault (core dumped) 希望您能告诉这个镜像对应的具体cuda和cudnn版本号
w5688414 commented 1 year ago

安装完成后您可以使用 python 进入 python 解释器,输入import paddle ,再输入 paddle.utils.run_check()

如果出现PaddlePaddle is installed successfully!,说明您已成功安装。

fzg0202 commented 1 year ago

@w5688414 您好,按照您说的。出现了错误。错误信息如下:

No stack trace in paddle, may be caused by external reasons.

Error Message Summary:
FatalError: `Segmentation fault` is detected by the operating system.
  [TimeInfo: *** Aborted at 1677810217 (unix time) try "date -d @1677810217" if you are using GNU date ***]
  [SignalInfo: *** SIGSEGV (@0x0) received by PID 449 (TID 0x7f6374f5f740) from PID 0 ***]

Segmentation fault (core dumped)
