PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
12.11k stars 2.94k forks source link

[Question]: Connection Error. Is pipelines running? #4927

Closed fzg0202 closed 8 months ago

fzg0202 commented 1 year ago

请提出你的问题

在linux上通过docker部署语义检索系统时(快速部署),出现报错Connection Error. Is pipelines running? An error occurred during the request. 目前cuda版本是11.6,请问该怎样解决呢

w5688414 commented 1 year ago

请提出你的问题

在linux上通过docker部署语义检索系统时(快速部署),出现报错Connection Error. Is pipelines running? An error occurred during the request. 目前cuda版本是11.6,请问该怎样解决呢

您好,可以使用docker logs看看错误日志是啥。

如果是docker的编译的cuda版本的问题,可以参考下面的教程制作一个cuda11.6版本的,流程如下: https://github.com/PaddlePaddle/PaddleNLP/tree/develop/pipelines/docker

fzg0202 commented 1 year ago

@w5688414 我的日志里面错误信息是:elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='localhost', port='9200'): Read timed out. (read timeout=30))。 还有我查看容器内部的paddlepaddle-gpu的版本是2.3.2.post112,这意味着是elastic镜像的问题,还是我需要制作一个cuda11.6的镜像呢

w5688414 commented 1 year ago

@w5688414 我的日志里面错误信息是:elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='localhost', port='9200'): Read timed out. (read timeout=30))。 还有我查看容器内部的paddlepaddle-gpu的版本是2.3.2.post112,这意味着是elastic镜像的问题,还是我需要制作一个cuda11.6的镜像呢

检查一下elastic search能否正常访问,es和pipelines的容器默认要在同一台宿主机器上,同时pipelines还需要能够访问宿主机的localhost,您启动docker的命令是啥?

fzg0202 commented 1 year ago

@w5688414 我的日志里面错误信息是:elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='localhost', port='9200'): Read timed out. (read timeout=30))。 还有我查看容器内部的paddlepaddle-gpu的版本是2.3.2.post112,这意味着是elastic镜像的问题,还是我需要制作一个cuda11.6的镜像呢

检查一下elastic search能否正常访问,es和pipelines的容器默认要在同一台宿主机器上,同时pipelines还需要能够访问宿主机的localhost,您启动docker的命令是啥?

启动docker的命令就是nvidia-docker run -d --name paddlenlp_pipelines_gpu --net host -ti registry.baidubce.com/paddlepaddle/paddlenlp:2.4.0-gpu-cuda11.2-cudnn8 并且两个容器都是按照github上面的步骤,安装在同一个服务器上的。

fzg0202 commented 1 year ago

@w5688414 我在我本地的windows(win11)上通过快速部署elastic和paddlenlp_pipelines的镜像,还是出现Connection Error. Is pipelines running?的错误。经过查看,在paddlenlp_pipelines容器日志里面给出的报错是:requests.exceptions.ConnectionError: HTTPSConnectionPool(host='paddlenlp.bj.bcebos.com', port=443): Max retries exceeded with url: /applications/dureader_dev.zip (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fd38c40f410>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))

w5688414 commented 1 year ago

这个看上去是外网没通

@w5688414 我在我本地的windows(win11)上通过快速部署elastic和paddlenlp_pipelines的镜像,还是出现Connection Error. Is pipelines running?的错误。经过查看,在paddlenlp_pipelines容器日志里面给出的报错是:requests.exceptions.ConnectionError: HTTPSConnectionPool(host='paddlenlp.bj.bcebos.com', port=443): Max retries exceeded with url: /applications/dureader_dev.zip (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fd38c40f410>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))

fzg0202 commented 1 year ago

@w5688414 我按照那另一个issue里面的回复,将[PaddleNLP/pipelines/utils/offline_ann.py]里面的dureader_dev.zip和baike.zip先下载之后,上传到服务器,并存放到PaddleNLP/pipelines/utils/data文件夹下,但是还是报错了。请问这里有什么问题吗。还有就是offline_ann.py中的 data_dict = { 'data/dureader_dev': "https://paddlenlp.bj.bcebos.com/applications/dureader_dev.zip", "data/baike": "https://paddlenlp.bj.bcebos.com/applications/baike.zip" } 需要做怎样的修改呢

w5688414 commented 1 year ago

您是不能访问外网是吗?这是给的示例数据,您可以把dureader_dev.zip放在data目录下解压即可,然后需要把offline_ann里面的下载的代码注释掉即可。

fzg0202 commented 1 year ago

@w5688414 我按照您说的,注释和解压之后,还是报错了。报错信息如下: INFO - pipelines.utils.import_utils - Fetching from https://paddlenlp.bj.bcebos.com/applications/dureader_dev.zip to data/dureader_dev Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/urllib3/connection.py", line 175, in _new_conn (self._dns_host, self.port), self.timeout, **extra_kw File "/usr/local/lib/python3.7/dist-packages/urllib3/util/connection.py", line 72, in create_connection for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM): File "/usr/lib/python3.7/socket.py", line 752, in getaddrinfo for res in _socket.getaddrinfo(host, port, family, type, proto, flags): socket.gaierror: [Errno -3] Temporary failure in name resolution

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/urllib3/connectionpool.py", line 710, in urlopen chunked=chunked, File "/usr/local/lib/python3.7/dist-packages/urllib3/connectionpool.py", line 386, in _make_request self._validate_conn(conn) File "/usr/local/lib/python3.7/dist-packages/urllib3/connectionpool.py", line 1040, in _validate_conn conn.connect() File "/usr/local/lib/python3.7/dist-packages/urllib3/connection.py", line 358, in connect self.sock = conn = self._new_conn() File "/usr/local/lib/python3.7/dist-packages/urllib3/connection.py", line 187, in _new_conn self, "Failed to establish a new connection: %s" % e urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7fa3752ac390>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/requests/adapters.py", line 499, in send timeout=timeout, File "/usr/local/lib/python3.7/dist-packages/urllib3/connectionpool.py", line 786, in urlopen method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2] File "/usr/local/lib/python3.7/dist-packages/urllib3/util/retry.py", line 592, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='paddlenlp.bj.bcebos.com', port=443): Max retries exceeded with url: /applications/dureader_dev.zip (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fa3752ac390>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "utils/offline_ann.py", line 111, in output_dir=args.doc_dir) File "/usr/local/lib/python3.7/dist-packages/pipelines-0.1.0a0-py3.7.egg/pipelines/utils/import_utils.py", line 87, in fetch_archive_from_http request_data = requests.get(url, proxies=proxies) File "/usr/local/lib/python3.7/dist-packages/requests/api.py", line 73, in get return request("get", url, params=params, kwargs) File "/usr/local/lib/python3.7/dist-packages/requests/api.py", line 59, in request return session.request(method=method, url=url, kwargs) File "/usr/local/lib/python3.7/dist-packages/requests/sessions.py", line 587, in request resp = self.send(prep, send_kwargs) File "/usr/local/lib/python3.7/dist-packages/requests/sessions.py", line 701, in send r = adapter.send(request, kwargs) File "/usr/local/lib/python3.7/dist-packages/requests/adapters.py", line 565, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPSConnectionPool(host='paddlenlp.bj.bcebos.com', port=443): Max retries exceeded with url: /applications/dureader_dev.zip (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fa3752ac390>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')) Delete an existing elasticsearch index dureader_robust_query_encoder Done. /bin/sh: 1: docker: not found WARNING - pipelines.utils.doc_store - Tried to start Elasticsearch through Docker but this failed. It is likely that there is already an existing Elasticsearch instance running. [] INFO - pipelines.utils.common_utils - Using devices: PLACE(CPU) INFO - pipelines.utils.common_utils - Number of GPUs: 0 [2023-02-23 03:17:52,147] [ INFO] - Already cached /root/.paddlenlp/models/rocketqa-zh-nano-query-encoder/rocketqa-zh-nano-query-encoder.pdparams [2023-02-23 03:18:02,831] [ INFO] - Already cached /root/.paddlenlp/models/rocketqa-zh-nano-para-encoder/rocketqa-zh-nano-para-encoder.pdparams [2023-02-23 03:18:03,463] [ INFO] - We are using <class 'paddlenlp.transformers.ernie.tokenizer.ErnieTokenizer'> to load 'rocketqa-zh-nano-query-encoder'. [2023-02-23 03:18:03,463] [ INFO] - Already cached /root/.paddlenlp/models/rocketqa-zh-nano-query-encoder/ernie_3.0_nano_zh_vocab.txt [2023-02-23 03:18:03,478] [ INFO] - tokenizer config file saved in /root/.paddlenlp/models/rocketqa-zh-nano-query-encoder/tokenizer_config.json [2023-02-23 03:18:03,478] [ INFO] - Special tokens file saved in /root/.paddlenlp/models/rocketqa-zh-nano-query-encoder/special_tokens_map.json [2023-02-23 03:18:03,479] [ INFO] - We are using <class 'paddlenlp.transformers.ernie.tokenizer.ErnieTokenizer'> to load 'rocketqa-zh-nano-para-encoder'. [2023-02-23 03:18:03,479] [ INFO] - Already cached /root/.paddlenlp/models/rocketqa-zh-nano-para-encoder/ernie_3.0_nano_zh_vocab.txt [2023-02-23 03:18:03,494] [ INFO] - tokenizer config file saved in /root/.paddlenlp/models/rocketqa-zh-nano-para-encoder/tokenizer_config.json [2023-02-23 03:18:03,494] [ INFO] - Special tokens file saved in /root/.paddlenlp/models/rocketqa-zh-nano-para-encoder/special_tokens_map.json INFO - pipelines.utils.logger - Logged parameters: {'processor': 'TextSimilarityProcessor', 'tokenizer': 'NoneType', 'max_seq_len': '0', 'dev_split': '0.1'} INFO - pipelines.document_stores.elasticsearch - Updating embeddings for all 0 docs ... Updating embeddings: 0 Docs [00:00, ? Docs/s]

fzg0202 commented 1 year ago

@w5688414 linux服务器(cuda11.6)上的报错信息是: OSError: (External) CUDA error(804), forward compatibility was attempted on non supported HW. [Hint: 'cudaErrorCompatNotSupportedOnDevice'. This error indicates that the system was upgraded to run with forward compatibility but the visible hardware detected by CUDAdoes not support this configuration. Refer to the compatibility documentation for the supported hardware matrix or ensurethat only supported hardware is visible during initialization via the CUDA_VISIBLE_DEVICES environment variable.] (at /paddle/paddle/phi/backends/gpu/cuda/cuda_info.cc:66) 这是不是意味着我必须制作特有的docker镜像才可以

w5688414 commented 1 year ago

cuda 11.6需要自己来制作一下镜像,目前没有打包版本这么高的。 https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/docker/linux-docker.html