PaddleX based on Docker doesn't seem to set CUDA properly.

bit-scientist commented 2 weeks ago

问题描述 Issue Description

Initially, I posted this question here, but advised by @Bobholamovic to move it to this repo

I am using a docker container built using docker run --gpus all --name paddlex -v $PWD:/paddle --shm-size=8g --network=host -it registry.baidubce.com/paddlex/paddlex:paddlex3.0.0b1-paddlepaddle3.0.0b1-gpu-cuda11.8-cudnn8.6-trt8.5 /bin/bash to run PP-ChatOCRv3-doc Pipeline:

from paddlex import create_pipeline

pipeline = create_pipeline(
    pipeline="PP-ChatOCRv3-doc",
    llm_name="ernie-3.5",
    # llm_params={"api_type": "qianfan", "ak": "", "sk": ""}
    llm_params={"api_type": "aistudio", "access_token": "my_token"} # Please enter your access_token; otherwise, the large model cannot be invoked.
)

visual_result, visual_info = pipeline.visual_predict("my_custom.pdf")

for res in visual_result:
    res.save_to_img("./output")
    res.save_to_html('./output')
    res.save_to_xlsx('./output')

vector = pipeline.build_vector(visual_info=visual_info)

chat_result = pipeline.chat(
    key_list=["乙方", "手机号"],
    visual_info=visual_info,
    vector=vector,
    )
chat_result.print()

However, after running the above script, it's throwing an error below:


==========
== CUDA ==
==========

CUDA Version 11.8.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

Using official model (RT-DETR-H_layout_3cls), the model files will be be automatically downloaded and saved in /root/.paddlex/official_models.
Connecting to https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b1_v2/RT-DETR-H_layout_3cls_infer.tar ...
Downloading RT-DETR-H_layout_3cls_infer.tar ...
[==================================================] 100.00%
Extracting RT-DETR-H_layout_3cls_infer.tar
[==================================================] 100.00%
Using official model (PP-OCRv4_server_det), the model files will be be automatically downloaded and saved in /root/.paddlex/official_models.
Connecting to https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b1_v2/PP-OCRv4_server_det_infer.tar ...
Downloading PP-OCRv4_server_det_infer.tar ...
[==================================================] 100.00%
Extracting PP-OCRv4_server_det_infer.tar
[==================================================] 100.00%
Using official model (PP-OCRv4_server_rec), the model files will be be automatically downloaded and saved in /root/.paddlex/official_models.
Connecting to https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b1_v2/PP-OCRv4_server_rec_infer.tar ...
Downloading PP-OCRv4_server_rec_infer.tar ...
[==================================================] 100.00%
Extracting PP-OCRv4_server_rec_infer.tar
[==================================================] 100.00%
Using official model (SLANet_plus), the model files will be be automatically downloaded and saved in /root/.paddlex/official_models.
Connecting to https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b1_v2/SLANet_plus_infer.tar ...
Downloading SLANet_plus_infer.tar ...
[==================================================] 100.00%
Extracting SLANet_plus_infer.tar
[==================================================] 100.00%
Using official model (PP-OCRv4_server_seal_det), the model files will be be automatically downloaded and saved in /root/.paddlex/official_models.
Connecting to https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b1_v2/PP-OCRv4_server_seal_det_infer.tar ...
Downloading PP-OCRv4_server_seal_det_infer.tar ...
[==================================================] 100.00%
Extracting PP-OCRv4_server_seal_det_infer.tar
[==================================================] 100.00%
Using official model (PP-OCRv4_server_rec), the model files will be be automatically downloaded and saved in /root/.paddlex/official_models.
/usr/local/lib/python3.10/dist-packages/setuptools-68.2.2-py3.10.egg/_distutils_hack/__init__.py:18: UserWarning: Distutils was imported before Setuptools, but importing Setuptools also replaces the `distutils` module in `sys.modules`. This may lead to undesirable behaviors or errors. To avoid these issues, avoid using distutils directly, ensure that setuptools is installed in the traditional way (e.g. not an editable install), and/or make sure that setuptools is always imported before distutils.
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/setuptools-68.2.2-py3.10.egg/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.
  warnings.warn("Setuptools is replacing distutils.")
/usr/local/lib/python3.10/dist-packages/paddle/base/framework.py:743: UserWarning: You are using GPU version Paddle, but your CUDA device is not set properly. CPU device will be used by default.
  warnings.warn(
Traceback (most recent call last):
  File "/paddle/test.py", line 10, in <module>
    visual_result, visual_info = pipeline.visual_predict("/paddle/HE-2203-202208-002.pdf")
  File "/root/PaddleX/paddlex/inference/pipelines/ppchatocrv3/ppchatocrv3.py", line 209, in visual_predict
    visual_result = list(
  File "/root/PaddleX/paddlex/inference/pipelines/ppchatocrv3/ppchatocrv3.py", line 252, in get_visual_result
    for idx, (img_info, layout_pred) in enumerate(
  File "/root/PaddleX/paddlex/inference/models/base/base_predictor.py", line 47, in __call__
    for res in super().__call__(input):
  File "/root/PaddleX/paddlex/inference/components/base.py", line 57, in __call__
    for each_output in output:
  File "/root/PaddleX/paddlex/inference/models/base/basic_predictor.py", line 48, in apply
    yield from self._generate_res(self.engine(input))
  File "/root/PaddleX/paddlex/inference/utils/process_hook.py", line 46, in _wrapper
    for ele in input_:
  File "/root/PaddleX/paddlex/inference/components/base.py", line 277, in __call__
    yield from self.__call__(data, i + 1)
  File "/root/PaddleX/paddlex/inference/components/base.py", line 277, in __call__
    yield from self.__call__(data, i + 1)
  File "/root/PaddleX/paddlex/inference/components/base.py", line 277, in __call__
    yield from self.__call__(data, i + 1)
  [Previous line repeated 1 more time]
  File "/root/PaddleX/paddlex/inference/components/base.py", line 276, in __call__
    for data in data_gen:
  File "/root/PaddleX/paddlex/inference/components/base.py", line 43, in __call__
    output = self.apply(**args)
  File "/root/PaddleX/paddlex/inference/components/paddle_predictor/predictor.py", line 167, in apply
    self.reset()
  File "/root/PaddleX/paddlex/inference/components/paddle_predictor/predictor.py", line 49, in reset
    ) = self._create()
  File "/root/PaddleX/paddlex/inference/components/paddle_predictor/predictor.py", line 145, in _create
    predictor = create_predictor(config)
ValueError: (InvalidArgument) Device id must be less than GPU count, but received id is: 0. GPU count is: 0.
  [Hint: Expected id < GetGPUDeviceCount(), but received id:0 >= GetGPUDeviceCount():0.] (at ../paddle/phi/backends/gpu/cuda/cuda_info.cc:255)

I checked if the container is seeing the GPU with:

import paddle
gpu_available  = paddle.device.is_compiled_with_cuda()
print("GPU available:", gpu_available) # returned True

I also ran nvcc -V with output:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

This line caught my attention, but had no luck in tackling it so far: UserWarning: You are using GPU version Paddle, but your CUDA device is not set properly. CPU device will be used by default.

Additionally, nvidia-smi returns:

Commands

python -c "import paddle; print(paddle.static.cuda_places())"
python -c "from paddle import core; print(core.get_cuda_device_count())"

Do you have any idea as to why this is happening and how to solve this?

版本&环境信息 Version & Environment Information

Paddle version: 3.0.0-beta1
Paddle With CUDA: True

OS: ubuntu 20.04
GCC version: (GCC) 8.2.0
Clang version: N/A
CMake version: version 3.18.0
Libc version: glibc 2.31
Python version: 3.10.15

CUDA version: 11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
cuDNN version: N/A
Nvidia driver version: 560.94
Nvidia driver List:
GPU 0: NVIDIA GeForce RTX 3080 Ti

westfish commented 2 weeks ago

paddle3.0.0b2发布了，重新安装一下再试试看呢 python -m pip install paddlepaddle-gpu==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/

bit-scientist commented 2 weeks ago

@westfish, thank you. Are you saying that I should run the docker container command docker run --gpus all --name paddlex -v $PWD:/paddle --shm-size=8g --network=host -it paddlepaddle-gpu==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/ /bin/bash? Because I am using docker container as stated above, so it would be better if you could provide the full command similar to docker run --gpus all --name paddlex -v $PWD:/paddle --shm-size=8g --network=host -it registry.baidubce.com/paddlex/paddlex:paddlex3.0.0b1-paddlepaddle3.0.0b1-gpu-cuda11.8-cudnn8.6-trt8.5 /bin/bash (with cudnn,trt versions) where it is located in the registry of some kind.

cuicheng01 commented 2 weeks ago

Hi, what’s your operating system?

bit-scientist commented 2 weeks ago

@cuicheng01, I am on Windows 10, but does it really matter, because I am trying to use docker image which is detached from the host OS?

PaddlePaddle / Paddle