PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
42.44k stars 7.66k forks source link

svtr yml用来训练,爆显存,batch_size=1也不行 #12517

Closed nissansz closed 3 months ago

nissansz commented 3 months ago

问题描述 / Problem Description

运行环境 / Runtime Environment

复现代码 / Reproduction Code

完整报错 / Complete Error Message

可能解决方案 / Possible solutions

附件 / Appendix

Out of memory error on GPU 0. Cannot allocate 120.000000MB memory on GPU 0, 7.953613GB memory has been allocated and available memory is only 47.500000MB.

Please check whether there is any other process using GPU 0.

  1. If yes, please stop them, or start PaddlePaddle on another GPU.
  2. If no, please decrease the batch size of your model.

    (at ..\paddle\fluid\memory\allocation\cuda_allocator.cc:87) . (at ..\paddle\fluid\imperative\tracer.cc:307)

C:\F\pycharm2020.2\PaddleOCR-2.7.5> C:\F\pycharm2020.2\PaddleOCR-2.7.5>

GreatV commented 3 months ago

原因: 其他程序或进程可能正在使用 GPU 内存。 解决方案: 使用 nvidia-smi 命令查看当前 GPU 的使用情况,并找到占用 GPU 内存的进程。 杀掉不必要的进程以释放 GPU 内存。

nissansz commented 3 months ago

没有其他占用,改resnet34 yml可以训练,一换svtr就不行

nissansz commented 3 months ago

image

nissansz commented 3 months ago

./configs/rec/rec_svtrnet_ch.yml 这个配置可以训练 ch_PP-OCRv4_rec_svtr_large4lan.yml 这个配置也可以训练,这个配置有什么问题吗?看到有这个issue https://github.com/PaddlePaddle/PaddleOCR/issues/12440

./configs/rec/PP-OCRv4/ch_PP-OCRv4_rec.yml 这个配置即使设置batch size =1也爆显存,怎么修改这个配置让其能训练?