docker多进程（process）多个Predictor预测时CUDA初始化失败

nihuizhidao commented 3 years ago

环境：docker paddlepaddle/paddle:1.8.4-gpu-cuda10.0-cudnn7

分别在多个multiprocessing.Process中使用AnalysisConfig创建Predictor时（参考了deploy/python/infer.py中Detector类的构建），出现如下错误：

predictor = fluid.core.create_paddle_predictor(config)
paddle.fluid.core_avx.EnforceNotMet:

C++ Call Stacks (More useful to developers):

0 std::string paddle::platform::GetTraceBackString<char const>(char const&&, char const, int) 1 paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const, int) 2 paddle::platform::SetDeviceId(int) 3 paddle::AnalysisConfig::fraction_of_gpu_memory_for_pool() const 4 std::unique_ptr<paddle::PaddlePredictor, std::default_delete > paddle::CreatePaddlePredictor<paddle::AnalysisConfig, (paddle::PaddleEngineKind)2>(paddle::AnalysisConfig const&) 5 std::unique_ptr<paddle::PaddlePredictor, std::default_delete > paddle::CreatePaddlePredictor(paddle::AnalysisConfig const&)

Error Message Summary:

ExternalError: Cuda error(3), initialization error. [Advise: The API call failed because the CUDA driver and runtime could not be initialized. ] at (/paddle/paddle/fluid/platform/gpu_info.cc:212)

但是相同代码，如果是单进程则可以正常运行，测试docker内环境和CUDA没有问题，请问多进程情况下，该如何部署多个不同的Predictor呢？

另外，AnalysisConfig的设置除了config.enable_use_gpu(initial_gpu_memory, gpu_id)中预分配了800M显存（GPU显存足够的）initial_gpu_memory=800外其他设置与deploy/python/infer.py中一致。

@qingqing01 qingqing老师，可以帮忙看下么，有点着急。。。谢谢！

nihuizhidao commented 3 years ago

这个问题比较奇怪，在windows下（不在docker里）是没有问题的，但是在CentOS 7.6 上不论是不是在docker里都有这个问题，并且系统环境里CUDA等相关NVIDIA的组件都是正常的，单进程是可以的，请问这个是什么原因呢？

nihuizhidao commented 3 years ago

测试了CentOS 7.6 不在docker里，paddlepaddle 2.0.1版本也是这个问题。 CUDA 10.0 + cudnn 7.6.3 + nccl + export CUDA_VISIBLE_DEVICES='0' (两张卡)环境

nihuizhidao commented 3 years ago

paddlepaddle 2.0开启C++调式信息的时候，报错是这样的：

Process Process-2: Traceback (most recent call last): File "/home/xxx/anaconda3/envs/paddle20/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/home/xxx/anaconda3/envs/paddle20/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(*self._args, **self._kwargs) File "app_multi_process.py", line 48, in run_detector_in_process initial_gpu_memory=configs["initial_gpu_memory"]) File "/home/xxx/users/schwarz/xxxx/modelnferenceServing/ppdet_infer.py", line 809, in init gpu_id=gpu_id, name=name, initial_gpu_memory=initial_gpu_memory) File "/home/xxx/users/schwarz/xxxx/modelnferenceServing/ppdet_infer.py", line 667, in init use_gpu=use_gpu, gpu_id=gpu_id, initial_gpu_memory=initial_gpu_memory) File "/home/xxx/users/schwarz/xxxx/modelnferenceServing/ppdet_infer.py", line 609, in load_predictor predictor = fluid.core.create_paddle_predictor(config) OSError:

C++ Traceback (most recent call last):

0 std::unique_ptr<paddle::PaddlePredictor, std::default_delete > paddle::CreatePaddlePredictor(paddle::AnalysisConfig const&) 1 std::unique_ptr<paddle::PaddlePredictor, std::default_delete > paddle::CreatePaddlePredictor<paddle::AnalysisConfig, (paddle::PaddleEngineKind)2>(paddle::AnalysisConfig const&) 2 paddle::AnalysisConfig::fraction_of_gpu_memory_for_pool() const 3 paddle::platform::SetDeviceId(int) 4 paddle::platform::EnforceNotMet::EnforceNotMet(paddle::platform::ErrorSummary const&, char const*, int) 5 paddle::platform::GetCurrentTraceBackString[abi:cxx11]()

Error Message Summary:

ExternalError: Cuda error(3), initialization error. [Advise: The API call failed because the CUDA driver and runtime could not be initialized. ] (at /paddle/paddle/fluid/platform/gpu_info.cc:229)

Process Process-1: Traceback (most recent call last): File "/home/xxx/anaconda3/envs/paddle20/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/home/xxx/anaconda3/envs/paddle20/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(*self._args, **self._kwargs) File "app_multi_process.py", line 48, in run_detector_in_process initial_gpu_memory=configs["initial_gpu_memory"]) File "/home/xxx/users/schwarz/xxxx/modelnferenceServing/ppdet_infer.py", line 809, in init gpu_id=gpu_id, name=name, initial_gpu_memory=initial_gpu_memory) File "/home/xxx/users/schwarz/xxxx/modelnferenceServing/ppdet_infer.py", line 667, in init use_gpu=use_gpu, gpu_id=gpu_id, initial_gpu_memory=initial_gpu_memory) File "/home/xxx/users/schwarz/xxxx/modelnferenceServing/ppdet_infer.py", line 609, in load_predictor predictor = fluid.core.create_paddle_predictor(config) OSError:

C++ Traceback (most recent call last):

0 std::unique_ptr<paddle::PaddlePredictor, std::default_delete > paddle::CreatePaddlePredictor(paddle::AnalysisConfig const&) 1 std::unique_ptr<paddle::PaddlePredictor, std::default_delete > paddle::CreatePaddlePredictor<paddle::AnalysisConfig, (paddle::PaddleEngineKind)2>(paddle::AnalysisConfig const&) 2 paddle::AnalysisConfig::fraction_of_gpu_memory_for_pool() const 3 paddle::platform::SetDeviceId(int) 4 paddle::platform::EnforceNotMet::EnforceNotMet(paddle::platform::ErrorSummary const&, char const*, int) 5 paddle::platform::GetCurrentTraceBackString[abi:cxx11]()

Error Message Summary:

ExternalError: Cuda error(3), initialization error. [Advise: The API call failed because the CUDA driver and runtime could not be initialized. ] (at /paddle/paddle/fluid/platform/gpu_info.cc:229)

debug mode: True main PID: 25803 [PROCESS STARTED] cam ID: c1com2i3j5ngtsi266v0, detector: yolov3_darknet_pedestrian, gpu_id: 0, PID: 25855 started. deploy_file: /home/xxx/users/schwarz/xxxx/modelnferenceServing/model_repository/yolov3_darknet_pedestrian/infer_cfg.yml [PROCESS STARTED] cam ID: c1com2i3j5ngtsi266v0, detector: yolov3_enhanced_coco, gpu_id: 0, PID: 25856 started. [['c1com2i3j5ngtsi266v0', 'yolov3_darknet_pedestrian', 25855], ['c1com2i3j5ngtsi266v0', 'yolov3_enhanced_coco', 25856]] deploy_file: /home/xxx/users/schwarz/xxxx/modelnferenceServing/model_repository/yolov3_enhanced_coco/infer_cfg.yml ----------- Paddle Detection Model Configuration ----------- name: yolov3_darknet_pedestrian gpu_id: 0 Model Arch: YOLO Use Paddle Executor: False Transform Order: -- preprocess op: Resize -- preprocess op: Normalize -- preprocess op: Permute

W0329 09:23:21.655159 25855 analysis_predictor.cc:1058] Deprecated. Please use CreatePredictor instead.

Debugger is active! ----------- Paddle Detection Model Configuration ----------- name: yolov3_enhanced_coco gpu_id: 0 Model Arch: YOLO Use Paddle Executor: False Transform Order: -- preprocess op: Resize -- preprocess op: Normalize -- preprocess op: Permute

Debugger PIN: 334-124-885 W0329 09:23:21.664276 25856 analysis_predictor.cc:1058] Deprecated. Please use CreatePredictor instead. Process Process-1: Traceback (most recent call last): File "/home/xxx/anaconda3/envs/paddle20/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/home/xxx/anaconda3/envs/paddle20/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(*self._args, **self._kwargs) File "/home/xxx/users/schwarz/xxxx/modelnferenceServing/app_multi_process.py", line 48, in run_detector_in_process initial_gpu_memory=configs["initial_gpu_memory"]) File "/home/xxx/users/schwarz/xxxx/modelnferenceServing/ppdet_infer.py", line 809, in init gpu_id=gpu_id, name=name, initial_gpu_memory=initial_gpu_memory) File "/home/xxx/users/schwarz/xxxx/modelnferenceServing/ppdet_infer.py", line 667, in init use_gpu=use_gpu, gpu_id=gpu_id, initial_gpu_memory=initial_gpu_memory) File "/home/xxx/users/schwarz/xxxx/modelnferenceServing/ppdet_infer.py", line 609, in load_predictor predictor = fluid.core.create_paddle_predictor(config) OSError:

C++ Traceback (most recent call last):

0 std::unique_ptr<paddle::PaddlePredictor, std::default_delete > paddle::CreatePaddlePredictor(paddle::AnalysisConfig const&) 1 std::unique_ptr<paddle::PaddlePredictor, std::default_delete > paddle::CreatePaddlePredictor<paddle::AnalysisConfig, (paddle::PaddleEngineKind)2>(paddle::AnalysisConfig const&) 2 paddle::AnalysisConfig::fraction_of_gpu_memory_for_pool() const 3 paddle::platform::SetDeviceId(int) 4 paddle::platform::EnforceNotMet::EnforceNotMet(paddle::platform::ErrorSummary const&, char const*, int) 5 paddle::platform::GetCurrentTraceBackString[abi:cxx11]()

Error Message Summary:

ExternalError: Cuda error(3), initialization error. [Advise: The API call failed because the CUDA driver and runtime could not be initialized. ] (at /paddle/paddle/fluid/platform/gpu_info.cc:229)

Process Process-2: Traceback (most recent call last): File "/home/xxx/anaconda3/envs/paddle20/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/home/xxx/anaconda3/envs/paddle20/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(*self._args, **self._kwargs) File "/home/xxx/users/schwarz/xxxx/modelnferenceServing/app_multi_process.py", line 48, in run_detector_in_process initial_gpu_memory=configs["initial_gpu_memory"]) File "/home/xxx/users/schwarz/xxxx/modelnferenceServing/ppdet_infer.py", line 809, in init gpu_id=gpu_id, name=name, initial_gpu_memory=initial_gpu_memory) File "/home/xxx/users/schwarz/xxxx/modelnferenceServing/ppdet_infer.py", line 667, in init use_gpu=use_gpu, gpu_id=gpu_id, initial_gpu_memory=initial_gpu_memory) File "/home/xxx/users/schwarz/xxxx/modelnferenceServing/ppdet_infer.py", line 609, in load_predictor predictor = fluid.core.create_paddle_predictor(config) OSError:

C++ Traceback (most recent call last):

0 std::unique_ptr<paddle::PaddlePredictor, std::default_delete > paddle::CreatePaddlePredictor(paddle::AnalysisConfig const&) 1 std::unique_ptr<paddle::PaddlePredictor, std::default_delete > paddle::CreatePaddlePredictor<paddle::AnalysisConfig, (paddle::PaddleEngineKind)2>(paddle::AnalysisConfig const&) 2 paddle::AnalysisConfig::fraction_of_gpu_memory_for_pool() const 3 paddle::platform::SetDeviceId(int) 4 paddle::platform::EnforceNotMet::EnforceNotMet(paddle::platform::ErrorSummary const&, char const*, int) 5 paddle::platform::GetCurrentTraceBackString[abi:cxx11]()

Error Message Summary:

ExternalError: Cuda error(3), initialization error. [Advise: The API call failed because the CUDA driver and runtime could not be initialized. ] (at /paddle/paddle/fluid/platform/gpu_info.cc:229)

Shixiaowei02 commented 3 years ago

您好，看起来像与机器cuda环境配置不正确有关。请问您是否试过同样代码的其它环境，或者使用c++接口部署？

qingqing01 commented 3 years ago

@Shixiaowei02 他上面说单进程是是可以正常运行的。

qingqing01 commented 3 years ago

@nihuizhidao 多进程时，"predictor = fluid.core.create_paddle_predictor(config)" 这句是在每个进程内部调用嘛？这个内部才会初始化设备信息，需要在每个进程内部调用。

nihuizhidao commented 3 years ago

您好，看起来像与机器cuda环境配置不正确有关。请问您是否试过同样代码的其它环境，或者使用c++接口部署？

对，单进程是正常的，而且windows下多进程也是可以的。。。

nihuizhidao commented 3 years ago

@nihuizhidao 多进程时，"predictor = fluid.core.create_paddle_predictor(config)" 这句是在每个进程内部调用嘛？这个内部才会初始化设备信息，需要在每个进程内部调用。

是的，在进程内部调用，我在想是不是有个config.SetInValid()的参数可以用，确保每个config只用来初始化一个Predictor

nihuizhidao commented 3 years ago

@nihuizhidao 多进程时，"predictor = fluid.core.create_paddle_predictor(config)" 这句是在每个进程内部调用嘛？这个内部才会初始化设备信息，需要在每个进程内部调用。

是的，在进程内部调用，我在想是不是有个config.SetInValid()的参数可以用，确保每个config只用来初始化一个Predictor

但是我试了SetInValid()好像也不行

qingqing01 commented 3 years ago

@nihuizhidao device id的设置是在Config中 enable_use_gpu，看下面代码，所以看起来是每个每个Predictor配套不同的Config。这里也需要注意下。

https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/05_inference_deployment/inference/python_infer_cn.html#daimashili

nihuizhidao commented 3 years ago

@nihuizhidao device id的设置是在Config中 enable_use_gpu，看下面代码，所以看起来是每个每个Predictor配套不同的Config。这里也需要注意下。

https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/05_inference_deployment/inference/python_infer_cn.html#daimashili

这部分的代码，我是基于deploy/python/infer.py改的：

    config = fluid.core.AnalysisConfig(
        os.path.join(model_dir, '__model__'),
        os.path.join(model_dir, '__params__'))
    if use_gpu:
        # initial GPU memory(M), device ID
        """Schwarz modified this
            add global config to initial GPU memory and gpu_id
        """
        config.enable_use_gpu(initial_gpu_memory, gpu_id)
        # optimize graph and fuse op
        config.switch_ir_optim(True)
    else:
        config.disable_gpu()

    if run_mode in precision_map.keys():
        config.enable_tensorrt_engine(
            workspace_size=1 << 10,
            max_batch_size=batch_size,
            min_subgraph_size=min_subgraph_size,
            precision_mode=precision_map[run_mode],
            use_static=False,
            use_calib_mode=False)

    # disable print log when predict
    config.disable_glog_info()
    # enable shared memory
    config.enable_memory_optim()
    # disable feed, fetch OP, needed by zero_copy_run
    config.switch_use_feed_fetch_ops(False)
    predictor = fluid.core.create_predictor(config)

Shixiaowei02 commented 3 years ago

您好，cuda 不支持 fork 方法使用多进程，建议使用多线程。请参考 issue： https://stackoverflow.com/questions/22950047/cuda-initialization-error-after-fork https://github.com/PaddlePaddle/Paddle/issues/25185

nihuizhidao commented 3 years ago

您好，cuda 不支持 fork 方法使用多进程，建议使用多线程。请参考 issue： https://stackoverflow.com/questions/22950047/cuda-initialization-error-after-fork PaddlePaddle/Paddle#25185

额。。。有点懵，不是说AnalysisConfig是线程不安全的么。。。

Shixiaowei02 commented 3 years ago

您好，cuda 不支持 fork 方法使用多进程，建议使用多线程。请参考 issue： https://stackoverflow.com/questions/22950047/cuda-initialization-error-after-fork PaddlePaddle/Paddle#25185

额。。。有点懵，不是说AnalysisConfig是线程不安全的么。。。

多线程，每个线程一个 config+predictor，加锁就可以了

nihuizhidao commented 3 years ago

您好，cuda 不支持 fork 方法使用多进程，建议使用多线程。请参考 issue： https://stackoverflow.com/questions/22950047/cuda-initialization-error-after-fork PaddlePaddle/Paddle#25185

额。。。有点懵，不是说AnalysisConfig是线程不安全的么。。。

多线程，每个线程一个 config+predictor，加锁就可以了

但是python的多线程是有性能问题的。。。 https://blog.csdn.net/baidu_36669549/article/details/95094464 多进程问题在pytorch里是可以解决的，只要在子进程内load模型就可以，但是paddle的话同样在子进程内初始化config和Predictor就有问题。。。 @Shixiaowei02

Shixiaowei02 commented 3 years ago

您好，cuda 不支持 fork 方法使用多进程，建议使用多线程。请参考 issue： https://stackoverflow.com/questions/22950047/cuda-initialization-error-after-fork PaddlePaddle/Paddle#25185

额。。。有点懵，不是说AnalysisConfig是线程不安全的么。。。

多线程，每个线程一个 config+predictor，加锁就可以了

但是python的多线程是有性能问题的。。。 https://blog.csdn.net/baidu_36669549/article/details/95094464 多进程问题在pytorch里是可以解决的，只要在子进程内load模型就可以，但是paddle的话同样在子进程内初始化config和Predictor就有问题。。。 @Shixiaowei02

https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/distributed/fleet/launch_utils.py 请问是否可以试下 Popen 的方式呢

qingqing01 commented 3 years ago

@nihuizhidao

您着急的话，感觉可以采用，类似多卡多进程训练的方式，启动多个进程，通过subprocess.Popen() 方式。

https://github.com/PaddlePaddle/Paddle/blob/bfb5cf5567a604fded177d90d639f7337015e3fa/python/paddle/distributed/fleet/launch_utils.py#L455

嗯，如 @Shixiaowei02 提到的。 fork的方式，我们需要点时间验证。

nihuizhidao commented 3 years ago

@nihuizhidao

您着急的话，感觉可以采用，类似多卡多进程训练的方式，启动多个进程，通过subprocess.Popen() 方式。

https://github.com/PaddlePaddle/Paddle/blob/bfb5cf5567a604fded177d90d639f7337015e3fa/python/paddle/distributed/fleet/launch_utils.py#L455

嗯，如 @Shixiaowei02 提到的。 fork的方式，我们需要点时间验证。

谢谢这么快回复

这个方式是在每个进程中使用独立的device_id进行训练吧？那比较适合训练，我的场景是推理场景，一个进程使用一个GPU有点太浪费了。。。

qingqing01 commented 3 years ago

@nihuizhidao 如果您多个进程想使用一个GPU，每个进程设置device_id时设置相同即可。

aishangmaxiaoming commented 2 years ago

@nihuizhidao 如果您多个进程想使用一个GPU，每个进程设置device_id时设置相同即可。

这个应该怎么改代码，在infer.py里？？

paddle-bot-old[bot] commented 2 years ago

Since this issue has not been updated for more than three months, it will be closed, if it is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up. It is recommended to pull and try the latest code first. 由于该问题超过三个月未更新，将会被关闭，若问题未解决或有后续问题，请随时重新打开（建议先拉取最新代码进行尝试），我们会继续跟进。

WooXinyi commented 2 years ago

@nihuizhidao 如果您多个进程想使用一个GPU，每个进程设置device_id时设置相同即可。

这个应该怎么改代码，在infer.py里？？

你好，请问多进程的推理方式你找到了吗

PaddlePaddle / PaddleDetection

docker多进程（process）多个Predictor预测时CUDA初始化失败 #2450

C++ Call Stacks (More useful to developers):

Error Message Summary:

C++ Traceback (most recent call last):

Error Message Summary:

C++ Traceback (most recent call last):

Error Message Summary:

Debugger is active! ----------- Paddle Detection Model Configuration ----------- name: yolov3_enhanced_coco gpu_id: 0 Model Arch: YOLO Use Paddle Executor: False Transform Order: -- preprocess op: Resize -- preprocess op: Normalize -- preprocess op: Permute

C++ Traceback (most recent call last):

Error Message Summary:

C++ Traceback (most recent call last):

Error Message Summary: