nissansz commented 6 months ago

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem

系统环境/System Environment：
版本号/Version：Paddle： PaddleOCR：2.7.5 问题相关组件/Related components：
运行指令/Command Code：
完整报错/Complete Error Message：

import paddle

required: gpu

paddle.set_device("gpu")

tensor = paddle.randn([512, 512, 512], "float")

del tensor

paddle.device.cuda.empty_cache()

TingquanGao commented 6 months ago

是遇到报错了么？运行的什么命令呢？

TingquanGao commented 6 months ago

empty_cache只是释放不用的显存空间，一般不需要手动调用

nissansz commented 6 months ago

是遇到报错了么？运行的什么命令呢？

就是在生成完一张图片时，我就用paddleocr.py识别图片的结果，报错如下，不知道有没有可以避免这种错误的识别方法

File "C:\F\pycharm2020.2\PaddleOCR-2.7.5\paddleocr.py", line 712, in ocr rec_res, elapse = self.text_recognizer(img) File "C:\F\pycharm2020.2\PaddleOCR-2.7.5\tools\infer\predict_rec.py", line 669, in call self.input_tensor.copy_from_cpu(norm_img_batch) File "C:\Program Files\Python38\lib\site-packages\paddle\fluid\inference\wrapper.py", line 36, in tensor_copy_from_cpu self.copy_from_cpu_bind(data) OSError: (External) CUDA error(719), unspecified launch failure. [Hint: 'cudaErrorLaunchFailure'. An exception occurred on the device while executing a kernel. Common causes include dereferencing an invalid device pointerand accessing out of bounds shared memory. Less common cases can be system specific - more information about these cases canbe found in the system specific user guide. This leaves the process in an inconsistent state and any further CUDA work willreturn the same error. To continue using CUDA, the process must be terminated and relaunched.] (at ..\paddle\phi\backends\gpu\cuda\cuda_info.cc:251)

TingquanGao commented 6 months ago

试一下这个呢？

import paddle
paddle.utils.install_check.run_check()

nissansz commented 6 months ago

checkgpu.py Running verify PaddlePaddle program ... W0416 11:22:26.553992 9688 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 10.2, Runtime API Version: 10.2 W0416 11:22:26.973874 9688 gpu_resources.cc:91] device: 0, cuDNN Version: 7.6. PaddlePaddle works well on 1 GPU. PaddlePaddle works well on 1 GPUs. PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.

Process finished with exit code 0

TingquanGao commented 6 months ago

paddle版本、cuda、cudnn版本是多少呢？

nissansz commented 6 months ago

paddleocr2.7.5, cuda 10

TingquanGao commented 6 months ago

paddlepaddle-gpu版本是多少？新版的paddle已经不支持cuda10了

nissansz commented 6 months ago

2.3.2

nissansz commented 6 months ago

cuda10.2好像还可以训练

TingquanGao commented 6 months ago

paddle2.3.2是支持cuda10.2的，但是2.3.2是很老的版本了，目前最新已经是2.6.1了。我查了一下，最新的支持10.2的paddle是这个，可以重新安装一下试试：https://paddle-wheel.bj.bcebos.com/2.5.2/linux/linux-gpu-cuda10.2-cudnn7-mkl-gcc8.2-avx/paddlepaddle_gpu-2.5.2.post102-cp38-cp38-linux_x86_64.whl

nissansz commented 6 months ago

我是win10

TingquanGao commented 6 months ago

https://paddle-wheel.bj.bcebos.com/2.5.2/windows/windows-gpu-cuda10.2-cudnn7.6.5-mkl-avx-vs2017/paddlepaddle_gpu-2.5.2.post102-cp38-cp38-win_amd64.whl

nissansz commented 6 months ago

谢谢。这个对于准确率有改善吗？

另外就是，用了这个版本可以一边加载图片识别，一边训练吗？

------------------ 原始邮件 ------------------

发件人: Tingquan Gao @.***>

发送时间: 2024-04-16 15:34:41

收件人:PaddlePaddle/PaddleOCR @.***>

抄送:nissanjp @.>,Author @.>

主题: Re: [PaddlePaddle/PaddleOCR] 这个代码可以解决爆显存吗？要加在哪些文件里？ (Issue #11907)

https://paddle-wheel.bj.bcebos.com/2.5.2/windows/windows-gpu-cuda10.2-cudnn7.6.5-mkl-avx-vs2017/paddlepaddle_gpu-2.5.2.post102-cp38-cp38-win_amd64.whl

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

TingquanGao commented 6 months ago

对准确率没有影响；得先解决报错的问题才行。

nissansz commented 6 months ago

如果想在训练时加载图片前识别，该用哪种方式识别？

识别完释放显存不占显存

------------------ 原始邮件 ------------------

发件人: Tingquan Gao @.***>

发送时间: 2024-04-16 15:43:36

收件人:PaddlePaddle/PaddleOCR @.***>

抄送:nissanjp @.>,Author @.>

主题: Re: [PaddlePaddle/PaddleOCR] 这个代码可以解决爆显存吗？要加在哪些文件里？ (Issue #11907)

对准确率没有影响；得先解决报错的问题才行。

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

TingquanGao commented 6 months ago

现在就是怀疑是因为paddle版本导致的显存未正常释放，所以想换一下paddle版本试试。

nissansz commented 6 months ago

我先安装试试。

resnet34 和 svtr准确率哪个高？好像svtr训练速度特别慢

nissansz commented 6 months ago

装了 2.5.2，直接报错了，。

File "C:\F\pycharm2020.2\PaddleOCR-2.7.5\ppocr\data\simple_dataset.py", line 36, in from .textline2png_pillow_CharacterCheck3shadow24 import text2png_supGenerator File "C:\F\pycharm2020.2\PaddleOCR-2.7.5\ppocr\data\textline2png_pillow_CharacterCheck3shadow24.py", line 20, in from straug.blur import GaussianBlur, DefocusBlur, MotionBlur, GlassBlur, ZoomBlur File "C:\Program Files\Python38\lib\site-packages\straug\blur.py", line 21, in import torchvision.transforms as transforms File "C:\Program Files\Python38\lib\site-packages\torchvision__init.py", line 4, in from .extension import _HAS_OPS File "C:\Program Files\Python38\lib\site-packages\torchvision\extension.py", line 6, in import torch File "C:\Program Files\Python38\lib\site-packages\torch\init__.py", line 197, in from torch._C import * # noqa: F403 RuntimeError: generic_type: type "_CudaDeviceProperties" is already registered!

nissansz commented 6 months ago

继续用2.3.2，开始可以识别的。以下是部分结果，剔除0.15以下，0.95以上的

result5-3-[[('Co', 0.07527747005224228)]] result14-[[('收縮自毛YEVGE', 0.9239646196365356)]] result14-[[('ぬら化。穷촬ャ转。0/27', 0.7019428610801697)]] result14-[[('PEDEPIO', 0.8143212199211121)]] result14-[[(' Basest 衰替 1:6', 0.935200572013855)]] result14-[[('WITTOP', 0.4973234236240387)]] result5-[[('hebbenrean1l', 0.5594083666801453)]] result14-[[('蜂튎丣值解!つ', 0.8142125010490417)]] result14-[[('GYPSY DESC', 0.9921039342880249)]] result4-1-[[('GYPSY DESC', 0.9921039342880249)]] result4-2-[[('GYPSY DESC', 0.9954227209091187)]] result4-3-[[('GYPSY DESC', 0.9990944862365723)]] result4-4-[[('GYPSY DESC', 0.996280312538147)]] result14-[[('汗汁,cominte', 0.9927375912666321)]] result4-1-[[('汗汁,cominte', 0.9927375912666321)]] result4-2-[[('汗汁,cominte', 0.99488765001297)]] result14-[[('FRom COFfINs,', 0.9780336618423462)]] result4-1-[[('FRom COFfINs,', 0.9780336618423462)]] result14-[[('용떼물뜻"수락하면。0-%l3', 0.7857113480567932)]] result5-[[('reeetlea', 0.3912365734577179)]] result14-[[('Geitzkear', 0.9997942447662354)]] result4-1-[[('Geitzkear', 0.9997942447662354)]] result4-2-[[('Geitzkear', 0.9999937415122986)]] result4-3-[[('Geitzkear', 0.9998283386230469)]] result4-4-[[('Geitzkear', 0.9999485015869141)]] result4-5-[[('Geitzkear', 0.999990701675415)]] result4-6-[[('Geitzkear', 0.9995196461677551)]] result4-7-[[('Geitzkear', 0.9997355937957764)]] result4-8-[[('Geitzkear', 0.9976885914802551)]] result4-9-[[('Geitzkear', 0.990534245967865)]] result4-10-[[('Geitzkear', 0.9999623894691467)]] result14-[[('rub$blespritzd”e', 0.9979126453399658)]] result4-1-[[('rub$blespritzd”e', 0.9979126453399658)]] result5-[[('Dno', 0.29163801670074463)]] result14-[[('붾썰·어둡겠어.', 0.839163601398468)]] result14-[[('鲽-蔼鈉排,抜,か', 0.7933483123779297)]] result14-[[('뚾掱武몰鞠', 0.6519795060157776)]] result14-[[('ふな：けやばけbp19', 0.530071496963501)]] result14-[[('Neave,흉', 0.9968245625495911)]] result4-1-[[('Neave,흉', 0.9968245625495911)]] result4-2-[[('Neave,흉', 0.9508001208305359)]] result5-[[("oUeNeracRUrH's", 0.2726333439350128)]] result14-[[('穆托姆博,糺氛롛', 0.9635410308837891)]] result4-1-[[('穆托姆博,糺氛롛', 0.9635410308837891)]] result14-[[('Psychia tri', 0.9884135723114014)]] result4-1-[[('Psychia tri', 0.9884135723114014)]] result14-[[('部垢텗騰},', 0.6940834522247314)]] result14-[[('海空通光', 0.27336710691452026)]] result14-[[('壞#요Iざっひ', 0.35629844665527344)]] result14-[[('举箱?千州?0,2', 0.6918652057647705)]] result5-[[('glooly', 0.5284292697906494)]] result14-[[('S eibel户位炉房妒马产', 0.8280718922615051)]] result14-[[('툖駕[领0。기념식', 0.6854994297027588)]] result14-[[('没嚆（誠갡吕格正0：', 0.7484003901481628)]] result5-[[('s1253483', 0.7288013696670532)]] result14-[[('6#76#7', 0.9951172471046448)]] result4-1-[[('6#76#7', 0.9951172471046448)]]

nissansz commented 6 months ago

过了一段时间，就报错了

tink2123 commented 5 months ago

几个方案可以尝试：

先升级到最新的develop
试试换显存分配器：export FLAGS_allocator_strategy=naive_best_fit&&export FLAGS_fraction_of_gpu_memory_to_use=0.92，这个打开后会一开始占用大量显存，后续不会auto growth
如果1 不奏效，再试试export FLAGS_allocator_strategy=auto_growth&&export FLAGS_eager_delete_tensor_gb=2 （显存小于2G时触发GC）

nissansz commented 5 months ago

这个 export在哪里加？windows也有效？

tink2123 commented 5 months ago

export 是设置环境变量的命令，windows用set

PaddlePaddle / PaddleOCR

这个代码可以解决爆显存吗？要加在哪些文件里？ #11907

required: gpu

tensor = paddle.randn([512, 512, 512], "float")

del tensor