About the bug of paddleocr running with distributed.launch

PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

https://paddlepaddle.github.io/PaddleOCR/

Apache License 2.0

43.02k stars 7.72k forks source link

About the bug of paddleocr running with distributed.launch #13218

Closed LDOUBLEV closed 2 months ago

LDOUBLEV commented 3 months ago

问题描述 / Problem Description

获取gpu id 的地方需要优化，paddle.distributed.launch 多卡调用paddleocr会导致模型只使用第一张卡运行？

运行环境 / Runtime Environment

OS:
Paddle:
PaddleOCR:

复现代码 / Reproduction Code

完整报错 / Complete Error Message

可能解决方案 / Possible solutions

附件 / Appendix

GreatV commented 3 months ago

@LDOUBLEV, 老哥你能不能抽时间给它修一下。这个好像一直都是只能用一个gpu推理。

GreatV commented 3 months ago

而且看前面window系统直接返回gpu_id=0了

LDOUBLEV commented 3 months ago

你们修一下呗 https://github.com/PaddlePaddle/PaddleOCR/blob/433677182f108c1be413ee8a92815bc13b205737/tools/infer/utility.py#L250 主要是和inference的同学确认下 inference Config里的参数设置问题，是不是必须要传入gpu_id；用distributed.launch的能不能跑inference

@LDOUBLEV, 老哥你能不能抽时间给它修一下。这个好像一直都是只能用一个gpu推理。

jzhang533 commented 3 months ago

@LDOUBLEV 威威，现在 PaddleOCR 项目，主要是用爱发电在维护。 GreatV 也不是百度的雇员，所以也联系不到 inference 的人。

GreatV commented 3 months ago

https://www.paddlepaddle.org.cn/inference/master/api_reference/cxx_api_doc/Config/GPUConfig.html#gpu 从这里看是必须要传入gpu_id 的

LDOUBLEV commented 3 months ago

https://www.paddlepaddle.org.cn/inference/master/api_reference/cxx_api_doc/Config/GPUConfig.html#gpu 从这里看是必须要传入gpu_id 的

看来是的；想了下，用distrbute.launch并行跑inference不太合适，这个case是一个用户发我的，主要诉求是多卡并行跑ppocr的推理，包装个多进程就能实现，每个卡上都初始化ppocr的inference model

不过这里windows的处理那里还是有bug

也可以加一个警告，告诉用户默认使用第一个卡执行推理 @GreatV

GreatV commented 3 months ago

也可以加一个警告，告诉用户默认使用第一个卡执行推理 @LDOUBLEV Got it.