PaddlePaddle / PaddleDetection

Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.
Apache License 2.0
12.68k stars 2.87k forks source link

docker多进程(process)多个Predictor预测时CUDA初始化失败 #2450

Closed nihuizhidao closed 2 years ago

nihuizhidao commented 3 years ago

环境:docker paddlepaddle/paddle:1.8.4-gpu-cuda10.0-cudnn7

分别在多个multiprocessing.Process中使用AnalysisConfig创建Predictor时(参考了deploy/python/infer.pyDetector类的构建),出现如下错误:

predictor = fluid.core.create_paddle_predictor(config)

paddle.fluid.core_avx.EnforceNotMet:


C++ Call Stacks (More useful to developers):

0 std::string paddle::platform::GetTraceBackString<char const>(char const&&, char const, int) 1 paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const, int) 2 paddle::platform::SetDeviceId(int) 3 paddle::AnalysisConfig::fraction_of_gpu_memory_for_pool() const 4 std::unique_ptr<paddle::PaddlePredictor, std::default_delete > paddle::CreatePaddlePredictor<paddle::AnalysisConfig, (paddle::PaddleEngineKind)2>(paddle::AnalysisConfig const&) 5 std::unique_ptr<paddle::PaddlePredictor, std::default_delete > paddle::CreatePaddlePredictor(paddle::AnalysisConfig const&)


Error Message Summary:

ExternalError: Cuda error(3), initialization error. [Advise: The API call failed because the CUDA driver and runtime could not be initialized. ] at (/paddle/paddle/fluid/platform/gpu_info.cc:212)

但是相同代码,如果是单进程则可以正常运行,测试docker内环境和CUDA没有问题,请问多进程情况下,该如何部署多个不同的Predictor呢?

另外,AnalysisConfig的设置除了config.enable_use_gpu(initial_gpu_memory, gpu_id)中预分配了800M显存(GPU显存足够的)initial_gpu_memory=800外其他设置与deploy/python/infer.py中一致。

@qingqing01 qingqing老师,可以帮忙看下么,有点着急。。。谢谢!

nihuizhidao commented 3 years ago

这个问题比较奇怪,在windows下(不在docker里)是没有问题的,但是在CentOS 7.6 上不论是不是在docker里都有这个问题,并且系统环境里CUDA等相关NVIDIA的组件都是正常的,单进程是可以的,请问这个是什么原因呢?

nihuizhidao commented 3 years ago

测试了CentOS 7.6 不在docker里,paddlepaddle 2.0.1版本也是这个问题。 CUDA 10.0 + cudnn 7.6.3 + nccl + export CUDA_VISIBLE_DEVICES='0' (两张卡)环境

nihuizhidao commented 3 years ago

paddlepaddle 2.0开启C++调式信息的时候,报错是这样的:

Process Process-2: Traceback (most recent call last): File "/home/xxx/anaconda3/envs/paddle20/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/home/xxx/anaconda3/envs/paddle20/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(*self._args, **self._kwargs) File "app_multi_process.py", line 48, in run_detector_in_process initial_gpu_memory=configs["initial_gpu_memory"]) File "/home/xxx/users/schwarz/xxxx/modelnferenceServing/ppdet_infer.py", line 809, in init gpu_id=gpu_id, name=name, initial_gpu_memory=initial_gpu_memory) File "/home/xxx/users/schwarz/xxxx/modelnferenceServing/ppdet_infer.py", line 667, in init use_gpu=use_gpu, gpu_id=gpu_id, initial_gpu_memory=initial_gpu_memory) File "/home/xxx/users/schwarz/xxxx/modelnferenceServing/ppdet_infer.py", line 609, in load_predictor predictor = fluid.core.create_paddle_predictor(config) OSError:


C++ Traceback (most recent call last):

0 std::unique_ptr<paddle::PaddlePredictor, std::default_delete > paddle::CreatePaddlePredictor(paddle::AnalysisConfig const&) 1 std::unique_ptr<paddle::PaddlePredictor, std::default_delete > paddle::CreatePaddlePredictor<paddle::AnalysisConfig, (paddle::PaddleEngineKind)2>(paddle::AnalysisConfig const&) 2 paddle::AnalysisConfig::fraction_of_gpu_memory_for_pool() const 3 paddle::platform::SetDeviceId(int) 4 paddle::platform::EnforceNotMet::EnforceNotMet(paddle::platform::ErrorSummary const&, char const*, int) 5 paddle::platform::GetCurrentTraceBackString[abi:cxx11]()


Error Message Summary:

ExternalError: Cuda error(3), initialization error. [Advise: The API call failed because the CUDA driver and runtime could not be initialized. ] (at /paddle/paddle/fluid/platform/gpu_info.cc:229)

Process Process-1: Traceback (most recent call last): File "/home/xxx/anaconda3/envs/paddle20/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/home/xxx/anaconda3/envs/paddle20/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(*self._args, **self._kwargs) File "app_multi_process.py", line 48, in run_detector_in_process initial_gpu_memory=configs["initial_gpu_memory"]) File "/home/xxx/users/schwarz/xxxx/modelnferenceServing/ppdet_infer.py", line 809, in init gpu_id=gpu_id, name=name, initial_gpu_memory=initial_gpu_memory) File "/home/xxx/users/schwarz/xxxx/modelnferenceServing/ppdet_infer.py", line 667, in init use_gpu=use_gpu, gpu_id=gpu_id, initial_gpu_memory=initial_gpu_memory) File "/home/xxx/users/schwarz/xxxx/modelnferenceServing/ppdet_infer.py", line 609, in load_predictor predictor = fluid.core.create_paddle_predictor(config) OSError:


C++ Traceback (most recent call last):

0 std::unique_ptr<paddle::PaddlePredictor, std::default_delete > paddle::CreatePaddlePredictor(paddle::AnalysisConfig const&) 1 std::unique_ptr<paddle::PaddlePredictor, std::default_delete > paddle::CreatePaddlePredictor<paddle::AnalysisConfig, (paddle::PaddleEngineKind)2>(paddle::AnalysisConfig const&) 2 paddle::AnalysisConfig::fraction_of_gpu_memory_for_pool() const 3 paddle::platform::SetDeviceId(int) 4 paddle::platform::EnforceNotMet::EnforceNotMet(paddle::platform::ErrorSummary const&, char const*, int) 5 paddle::platform::GetCurrentTraceBackString[abi:cxx11]()


Error Message Summary:

ExternalError: Cuda error(3), initialization error. [Advise: The API call failed because the CUDA driver and runtime could not be initialized. ] (at /paddle/paddle/fluid/platform/gpu_info.cc:229)

debug mode: True main PID: 25803 [PROCESS STARTED] cam ID: c1com2i3j5ngtsi266v0, detector: yolov3_darknet_pedestrian, gpu_id: 0, PID: 25855 started. deploy_file: /home/xxx/users/schwarz/xxxx/modelnferenceServing/model_repository/yolov3_darknet_pedestrian/infer_cfg.yml [PROCESS STARTED] cam ID: c1com2i3j5ngtsi266v0, detector: yolov3_enhanced_coco, gpu_id: 0, PID: 25856 started. [['c1com2i3j5ngtsi266v0', 'yolov3_darknet_pedestrian', 25855], ['c1com2i3j5ngtsi266v0', 'yolov3_enhanced_coco', 25856]] deploy_file: /home/xxx/users/schwarz/xxxx/modelnferenceServing/model_repository/yolov3_enhanced_coco/infer_cfg.yml ----------- Paddle Detection Model Configuration ----------- name: yolov3_darknet_pedestrian gpu_id: 0 Model Arch: YOLO Use Paddle Executor: False Transform Order: -- preprocess op: Resize -- preprocess op: Normalize -- preprocess op: Permute

W0329 09:23:21.655159 25855 analysis_predictor.cc:1058] Deprecated. Please use CreatePredictor instead.

  • Debugger is active! ----------- Paddle Detection Model Configuration ----------- name: yolov3_enhanced_coco gpu_id: 0 Model Arch: YOLO Use Paddle Executor: False Transform Order: -- preprocess op: Resize -- preprocess op: Normalize -- preprocess op: Permute

  • Debugger PIN: 334-124-885 W0329 09:23:21.664276 25856 analysis_predictor.cc:1058] Deprecated. Please use CreatePredictor instead. Process Process-1: Traceback (most recent call last): File "/home/xxx/anaconda3/envs/paddle20/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/home/xxx/anaconda3/envs/paddle20/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(*self._args, **self._kwargs) File "/home/xxx/users/schwarz/xxxx/modelnferenceServing/app_multi_process.py", line 48, in run_detector_in_process initial_gpu_memory=configs["initial_gpu_memory"]) File "/home/xxx/users/schwarz/xxxx/modelnferenceServing/ppdet_infer.py", line 809, in init gpu_id=gpu_id, name=name, initial_gpu_memory=initial_gpu_memory) File "/home/xxx/users/schwarz/xxxx/modelnferenceServing/ppdet_infer.py", line 667, in init use_gpu=use_gpu, gpu_id=gpu_id, initial_gpu_memory=initial_gpu_memory) File "/home/xxx/users/schwarz/xxxx/modelnferenceServing/ppdet_infer.py", line 609, in load_predictor predictor = fluid.core.create_paddle_predictor(config) OSError:

C++ Traceback (most recent call last):

0 std::unique_ptr<paddle::PaddlePredictor, std::default_delete > paddle::CreatePaddlePredictor(paddle::AnalysisConfig const&) 1 std::unique_ptr<paddle::PaddlePredictor, std::default_delete > paddle::CreatePaddlePredictor<paddle::AnalysisConfig, (paddle::PaddleEngineKind)2>(paddle::AnalysisConfig const&) 2 paddle::AnalysisConfig::fraction_of_gpu_memory_for_pool() const 3 paddle::platform::SetDeviceId(int) 4 paddle::platform::EnforceNotMet::EnforceNotMet(paddle::platform::ErrorSummary const&, char const*, int) 5 paddle::platform::GetCurrentTraceBackString[abi:cxx11]()


Error Message Summary:

ExternalError: Cuda error(3), initialization error. [Advise: The API call failed because the CUDA driver and runtime could not be initialized. ] (at /paddle/paddle/fluid/platform/gpu_info.cc:229)

Process Process-2: Traceback (most recent call last): File "/home/xxx/anaconda3/envs/paddle20/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/home/xxx/anaconda3/envs/paddle20/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(*self._args, **self._kwargs) File "/home/xxx/users/schwarz/xxxx/modelnferenceServing/app_multi_process.py", line 48, in run_detector_in_process initial_gpu_memory=configs["initial_gpu_memory"]) File "/home/xxx/users/schwarz/xxxx/modelnferenceServing/ppdet_infer.py", line 809, in init gpu_id=gpu_id, name=name, initial_gpu_memory=initial_gpu_memory) File "/home/xxx/users/schwarz/xxxx/modelnferenceServing/ppdet_infer.py", line 667, in init use_gpu=use_gpu, gpu_id=gpu_id, initial_gpu_memory=initial_gpu_memory) File "/home/xxx/users/schwarz/xxxx/modelnferenceServing/ppdet_infer.py", line 609, in load_predictor predictor = fluid.core.create_paddle_predictor(config) OSError:


C++ Traceback (most recent call last):

0 std::unique_ptr<paddle::PaddlePredictor, std::default_delete > paddle::CreatePaddlePredictor(paddle::AnalysisConfig const&) 1 std::unique_ptr<paddle::PaddlePredictor, std::default_delete > paddle::CreatePaddlePredictor<paddle::AnalysisConfig, (paddle::PaddleEngineKind)2>(paddle::AnalysisConfig const&) 2 paddle::AnalysisConfig::fraction_of_gpu_memory_for_pool() const 3 paddle::platform::SetDeviceId(int) 4 paddle::platform::EnforceNotMet::EnforceNotMet(paddle::platform::ErrorSummary const&, char const*, int) 5 paddle::platform::GetCurrentTraceBackString[abi:cxx11]()


Error Message Summary:

ExternalError: Cuda error(3), initialization error. [Advise: The API call failed because the CUDA driver and runtime could not be initialized. ] (at /paddle/paddle/fluid/platform/gpu_info.cc:229)

Shixiaowei02 commented 3 years ago

您好,看起来像与机器cuda环境配置不正确有关。请问您是否试过同样代码的其它环境,或者使用c++接口部署?

qingqing01 commented 3 years ago

@Shixiaowei02 他上面说 单进程是 是可以正常运行的。

qingqing01 commented 3 years ago

@nihuizhidao 多进程时,"predictor = fluid.core.create_paddle_predictor(config)" 这句是在每个进程内部调用嘛? 这个内部才会初始化设备信息,需要在每个进程内部调用。

nihuizhidao commented 3 years ago

您好,看起来像与机器cuda环境配置不正确有关。请问您是否试过同样代码的其它环境,或者使用c++接口部署?

对,单进程是正常的,而且windows下多进程也是可以的。。。

nihuizhidao commented 3 years ago

@nihuizhidao 多进程时,"predictor = fluid.core.create_paddle_predictor(config)" 这句是在每个进程内部调用嘛? 这个内部才会初始化设备信息,需要在每个进程内部调用。

是的,在进程内部调用,我在想是不是有个config.SetInValid()的参数可以用,确保每个config只用来初始化一个Predictor

nihuizhidao commented 3 years ago

@nihuizhidao 多进程时,"predictor = fluid.core.create_paddle_predictor(config)" 这句是在每个进程内部调用嘛? 这个内部才会初始化设备信息,需要在每个进程内部调用。

是的,在进程内部调用,我在想是不是有个config.SetInValid()的参数可以用,确保每个config只用来初始化一个Predictor

但是我试了SetInValid()好像也不行

qingqing01 commented 3 years ago

@nihuizhidao device id的设置是在Config中 enable_use_gpu,看下面代码,所以看起来是每个每个Predictor配套不同的Config。这里也需要注意下。

https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/05_inference_deployment/inference/python_infer_cn.html#daimashili

图片
nihuizhidao commented 3 years ago

@nihuizhidao device id的设置是在Config中 enable_use_gpu,看下面代码,所以看起来是每个每个Predictor配套不同的Config。这里也需要注意下。

https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/05_inference_deployment/inference/python_infer_cn.html#daimashili

图片

这部分的代码,我是基于deploy/python/infer.py改的:

    config = fluid.core.AnalysisConfig(
        os.path.join(model_dir, '__model__'),
        os.path.join(model_dir, '__params__'))
    if use_gpu:
        # initial GPU memory(M), device ID
        """Schwarz modified this
            add global config to initial GPU memory and gpu_id
        """
        config.enable_use_gpu(initial_gpu_memory, gpu_id)
        # optimize graph and fuse op
        config.switch_ir_optim(True)
    else:
        config.disable_gpu()

    if run_mode in precision_map.keys():
        config.enable_tensorrt_engine(
            workspace_size=1 << 10,
            max_batch_size=batch_size,
            min_subgraph_size=min_subgraph_size,
            precision_mode=precision_map[run_mode],
            use_static=False,
            use_calib_mode=False)

    # disable print log when predict
    config.disable_glog_info()
    # enable shared memory
    config.enable_memory_optim()
    # disable feed, fetch OP, needed by zero_copy_run
    config.switch_use_feed_fetch_ops(False)
    predictor = fluid.core.create_predictor(config)
Shixiaowei02 commented 3 years ago

您好,cuda 不支持 fork 方法使用多进程,建议使用多线程。请参考 issue: https://stackoverflow.com/questions/22950047/cuda-initialization-error-after-fork https://github.com/PaddlePaddle/Paddle/issues/25185

nihuizhidao commented 3 years ago

您好,cuda 不支持 fork 方法使用多进程,建议使用多线程。请参考 issue: https://stackoverflow.com/questions/22950047/cuda-initialization-error-after-fork PaddlePaddle/Paddle#25185

额。。。有点懵,不是说AnalysisConfig是线程不安全的么。。。

Shixiaowei02 commented 3 years ago

您好,cuda 不支持 fork 方法使用多进程,建议使用多线程。请参考 issue: https://stackoverflow.com/questions/22950047/cuda-initialization-error-after-fork PaddlePaddle/Paddle#25185

额。。。有点懵,不是说AnalysisConfig是线程不安全的么。。。

多线程,每个线程一个 config+predictor,加锁就可以了

nihuizhidao commented 3 years ago

您好,cuda 不支持 fork 方法使用多进程,建议使用多线程。请参考 issue: https://stackoverflow.com/questions/22950047/cuda-initialization-error-after-fork PaddlePaddle/Paddle#25185

额。。。有点懵,不是说AnalysisConfig是线程不安全的么。。。

多线程,每个线程一个 config+predictor,加锁就可以了

但是python的多线程是有性能问题的。。。 https://blog.csdn.net/baidu_36669549/article/details/95094464 多进程问题在pytorch里是可以解决的,只要在子进程内load模型就可以,但是paddle的话同样在子进程内初始化config和Predictor就有问题。。。 @Shixiaowei02

Shixiaowei02 commented 3 years ago

您好,cuda 不支持 fork 方法使用多进程,建议使用多线程。请参考 issue: https://stackoverflow.com/questions/22950047/cuda-initialization-error-after-fork PaddlePaddle/Paddle#25185

额。。。有点懵,不是说AnalysisConfig是线程不安全的么。。。

多线程,每个线程一个 config+predictor,加锁就可以了

但是python的多线程是有性能问题的。。。 https://blog.csdn.net/baidu_36669549/article/details/95094464 多进程问题在pytorch里是可以解决的,只要在子进程内load模型就可以,但是paddle的话同样在子进程内初始化config和Predictor就有问题。。。 @Shixiaowei02

https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/distributed/fleet/launch_utils.py 请问是否可以试下 Popen 的方式呢

qingqing01 commented 3 years ago

@nihuizhidao

您着急的话,感觉可以采用,类似多卡多进程训练的方式,启动多个进程,通过subprocess.Popen() 方式。

https://github.com/PaddlePaddle/Paddle/blob/bfb5cf5567a604fded177d90d639f7337015e3fa/python/paddle/distributed/fleet/launch_utils.py#L455

嗯,如 @Shixiaowei02 提到的。 fork的方式,我们需要点时间验证。

nihuizhidao commented 3 years ago

@nihuizhidao

您着急的话,感觉可以采用,类似多卡多进程训练的方式,启动多个进程,通过subprocess.Popen() 方式。

https://github.com/PaddlePaddle/Paddle/blob/bfb5cf5567a604fded177d90d639f7337015e3fa/python/paddle/distributed/fleet/launch_utils.py#L455

嗯,如 @Shixiaowei02 提到的。 fork的方式,我们需要点时间验证。

谢谢这么快回复

这个方式是在每个进程中使用独立的device_id进行训练吧?那比较适合训练,我的场景是推理场景,一个进程使用一个GPU有点太浪费了。。。

qingqing01 commented 3 years ago

@nihuizhidao 如果您多个进程想使用一个GPU,每个进程设置device_id时设置相同即可。

aishangmaxiaoming commented 2 years ago

@nihuizhidao 如果您多个进程想使用一个GPU,每个进程设置device_id时设置相同即可。

这个应该怎么改代码,在infer.py里??

paddle-bot-old[bot] commented 2 years ago

Since this issue has not been updated for more than three months, it will be closed, if it is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up. It is recommended to pull and try the latest code first. 由于该问题超过三个月未更新,将会被关闭,若问题未解决或有后续问题,请随时重新打开(建议先拉取最新代码进行尝试),我们会继续跟进。

WooXinyi commented 2 years ago

@nihuizhidao 如果您多个进程想使用一个GPU,每个进程设置device_id时设置相同即可。

这个应该怎么改代码,在infer.py里??

你好,请问多进程的推理方式你找到了吗