PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
44.6k stars 7.85k forks source link

How to enable GPU ? #1814

Closed rp-koayst closed 3 years ago

rp-koayst commented 3 years ago

Hi,

I have CUDA 10.1 installed on my machine and I was looking for instruction on how to build the model with GPU.

Is CUDA 10.1 OK for building the model ?

step 1: I installed PaddlePaddle Fluid v2.0 using the instruction shown below: (https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_en/installation_en.md)

If you have cuda9 or cuda10 installed on your machine, please run the following command to install

python3 -m pip install paddlepaddle-gpu==2.0rc1 -i https://mirror.baidu.com/pypi/simple

step 2:

Then I run the code under "1. Use by code": https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_en/whl_en.md

It couldn't run properly. The error is CUDA version mismatched. Is it because of 10 vs 10.1 ?

Thanks.

xxxpsyduck commented 3 years ago
  1. It's always recommended to use docker so you will never get into cuda-related problems
  2. Uninstall the installed paddle and try: python3 -m pip install paddlepaddle-gpu==2.0.0rc1.post101 -f https://paddlepaddle.org.cn/whl/stable.html
  3. Check your cuda installation
rp-koayst commented 3 years ago
  1. Can't use docker as I need to re-train the model with specific images
  2. I uninstalled previously installed paddlepaddle-gpu. Then
  3. I installed paddlepaddle-gpu (2.0.0rc1.post101)
  4. I run the code https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_en/whl_en.md
  5. I got the following error

[2021/01/25 20:12:30] root INFO: train with paddle 2.0.0-rc1 and device CUDAPlace(0) [2021/01/25 20:12:30] root INFO: Initialize indexs of datasets:['./train_data/icdar2015/text_localization/train_icdar2015_label.txt'] [2021/01/25 20:12:30] root INFO: Initialize indexs of datasets:['./train_data/icdar2015/text_localization/test_icdar2015_label.txt'] W0125 20:12:30.198998 39006 device_context.cc:320] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1 W0125 20:12:30.200832 39006 device_context.cc:330] device: 0, cuDNN Version: 8.0. [2021/01/25 20:12:33] root INFO: load pretrained model from ['./pretrain_models/MobileNetV3_large_x0_5_pretrained'] [2021/01/25 20:12:33] root INFO: train dataloader has 63 iters, valid dataloader has 500 iters [2021/01/25 20:12:33] root INFO: During the training process, after the 0th iteration, an evaluation is run every 2000 iterations [2021/01/25 20:12:33] root INFO: Initialize indexs of datasets:['./train_data/icdar2015/text_localization/train_icdar2015_label.txt']


C++ Traceback (most recent call last):

0 paddle::framework::SignalHandle(char const*, int) 1 paddle::platform::GetCurrentTraceBackString[abi:cxx11]()


Error Message Summary:

FatalError: Segmentation fault is detected by the operating system. [TimeInfo: Aborted at 1611576758 (unix time) try "date -d @1611576758" if you are using GNU date ] [SignalInfo: SIGSEGV (@0x0) received by PID 39006 (TID 0x7fc4e9e13740) from PID 0 ]


  1. Then I tried run single GPU or multi GPU to build the model:

python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001

or

python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001

I still have the same error stated in number #5.

xxxpsyduck commented 3 years ago

You can mount any folders that contain the images you need into docker container. Please try to use docker.

rp-koayst commented 3 years ago

I am trying using Docker by following this => https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_en/installation_en.md

Step:

  1. sudo nvidia-docker run --name ppocr -v $PWD:/paddle --shm-size=64G --network=host -it paddlepaddle/paddle:latest-dev-cuda10.1-cudnn7-gcc82 /bin/bash
  2. Do I need to perform step 2 (Install PaddlePaddle Fluid v2.0), step 3 (Clone PaddleOCR repo) and step 4 (Install third-party libraries) ?

After nvidia-docker run command, I was at the prompt and when I did a directory listing "ls", I don't see anything related to paddleOCR code.

xxxpsyduck commented 3 years ago

please perform step-by-step following the guide. P/s: Please use # only in the case you want to reference an issue, pr or discussion. You really need to learn some basic docker and markdown.

rp-koayst commented 3 years ago

With your recommendation to check out Docker, I tried and it also didn't work. I can't even get the example to work:

Predict a single image specified by image_dir python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_ppocr_mobile_v2.0_det_infer/" --rec_model_dir="./inference/ch_ppocr_mobile_v2.0_rec_infer/" --cls_model_dir="./inference/ch_ppocr_mobile_v2.0_cls_infer/" --use_angle_cls=True --use_space_char=True

I also tried various docker images below. All have the same problem. latest-gpu-cuda9.0-cudnn7 latest latest-gpu-cuda11.0-cudnn8 latest-gpu-cuda10.1-cudnn7


python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_ppocr_mobile_v2.0_det_infer/" --rec_model_dir="./inference/ch_ppocr_mobile_v2.0_rec_infer/" --cls_model_dir="./inference/ch_ppocr_mobile_v2.0_cls_infer/" --use_angle_cls=True --use_space_char=True grep: warning: GREP_OPTIONS is deprecated; please use an alias or script Traceback (most recent call last): File "tools/infer/predict_system.py", line 30, in import tools.infer.predict_det as predict_det File "/home/PaddleOCR/tools/infer/predict_det.py", line 31, in from ppocr.data import create_operators, transform File "/home/PaddleOCR/ppocr/data/init.py", line 34, in from ppocr.data.imaug import transform, create_operators File "/home/PaddleOCR/ppocr/data/imaug/init.py", line 19, in from .iaa_augment import IaaAugment File "/home/PaddleOCR/ppocr/data/imaug/iaa_augment.py", line 20, in import imgaug File "/usr/local/python3.5.1/lib/python3.5/site-packages/imgaug/init.py", line 7, in from imgaug.imgaug import # pylint: disable=redefined-builtin File "/usr/local/python3.5.1/lib/python3.5/site-packages/imgaug/imgaug.py", line 22, in import skimage.draw File "/usr/local/python3.5.1/lib/python3.5/site-packages/skimage/init.py", line 127, in from .util.dtype import (img_as_float32, File "/usr/local/python3.5.1/lib/python3.5/site-packages/skimage/util/init.py", line 12, in from ._montage import montage File "/usr/local/python3.5.1/lib/python3.5/site-packages/skimage/util/_montage.py", line 2, in from .. import exposure File "/usr/local/python3.5.1/lib/python3.5/site-packages/skimage/exposure/init.py", line 1, in from .exposure import histogram, equalize_hist, \ File "/usr/local/python3.5.1/lib/python3.5/site-packages/skimage/exposure/exposure.py", line 3, in from ..color import rgb2gray File "/usr/local/python3.5.1/lib/python3.5/site-packages/skimage/color/init.py", line 1, in from .colorconv import (convert_colorspace, File "/usr/local/python3.5.1/lib/python3.5/site-packages/skimage/color/colorconv.py", line 55, in from scipy import linalg File "/usr/local/python3.5.1/lib/python3.5/site-packages/scipy/init.py", line 156, in from . import fft File "/usr/local/python3.5.1/lib/python3.5/site-packages/scipy/fft/init.py", line 76, in from ._basic import ( File "/usr/local/python3.5.1/lib/python3.5/site-packages/scipy/fft/_basic.py", line 1, in from scipy._lib.uarray import generate_multimethod, Dispatchable File "/usr/local/python3.5.1/lib/python3.5/site-packages/scipy/_lib/uarray.py", line 27, in from ._uarray import File "/usr/local/python3.5.1/lib/python3.5/site-packages/scipy/_lib/_uarray/init.py", line 114, in from ._backend import * File "/usr/local/python3.5.1/lib/python3.5/site-packages/scipy/_lib/_uarray/_backend.py", line 1, in from typing import (


All images having some sort of issues. Either the example can't work or problem in training.

One question I like to ask, when inside the docker container, I noticed there are various version of Python directory. For example "Python-3.6.0". Am I supposed to run/install/do something about it ?

rp-koayst commented 3 years ago

After using Docker, I was able to train the model.

asif-ca commented 1 year ago

After using Docker, I was able to train the model.

How you trained model on custom images can you share the process?