文本识别模型支持灰度图训练么？或者需要修改哪些地方？

PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

https://paddlepaddle.github.io/PaddleOCR/

Apache License 2.0

43.9k stars 7.8k forks source link

文本识别模型支持灰度图训练么？或者需要修改哪些地方？ #5147

Closed UBUNTUHWB closed 2 years ago

UBUNTUHWB commented 2 years ago

, error happened with msg: OpenCV(4.4.0) /tmp/pip-req-build-zeowd5_m/opencv/modules/imgproc/src/color.simd_helpers.hpp:92: error: (-2:Unspecified error) in function 'cv::impl::{anonymous}::CvtHelper<VScn, VDcn, VDepth, sizePolicy>::CvtHelper(cv::InputArray, cv::OutputArray, int) [with VScn = cv::impl::{anonymous}::Set<3, 4>; VDcn = cv::impl::{anonymous}::Set<3>; VDepth = cv::impl::{anonymous}::Set<0, 5>; cv::impl::{anonymous}::SizePolicy sizePolicy = cv::impl::::NONE; cv::InputArray = const cv::_InputArray&; cv::OutputArray = const cv::_OutputArray&]'

Invalid number of channels in input image: 'VScn::contains(scn)' where 'scn' is 1

Fatal Python error: Cannot recover from stack overflow. Python runtime state: initialized

andyjiang1116 commented 2 years ago

需要修改对应的config文件中的image_shape为 [1, h, w]，同时要修改使用的backbone的输入channels为1

UBUNTUHWB commented 2 years ago

我修改了yml文件： transforms:

DecodeImage: # load image img_mode: BGR channel_first: False
RecAug: aug_prob: 0.4
CTCLabelEncode: # Class handling label
RecResizeImg: image_shape: [1, 32, 320]

base_model.py文件 super(BaseModel, self).init() in_channels = config.get('in_channels', 1) model_type = config['model_type']

还是出现错误 , error happened with msg: could not broadcast input array from shape (32,256,3) into shape (1,32,256) [2021/12/31 09:58:07] root ERROR: When parsing line

andyjiang1116 commented 2 years ago

in_channels是在对应的backbone文件里面改的，不是在base_model里面改的

andyjiang1116 commented 2 years ago

比如说这里https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.4/ppocr/modeling/backbones/rec_mobilenet_v3.py#L24 或者在对应配置文件后面加一个参数，比如说： https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.4/configs/rec/rec_mv3_none_bilstm_ctc.yml#L38 这一行后面加一个参数，in_channels: 1

UBUNTUHWB commented 2 years ago

Architecture: model_type: rec algorithm: CRNN Transform: Backbone: name: ResNet in_channels: 1 layers: 34 为什么我在yml文件改了in_channels：1，调试也发现配置文件读到了，但是运行结果还是in_channels：3

还有就是rec_img_aug.py，文件中img和img_shape通道数不同导致padding_im[:, :, 0:resized_w] = resized_image这一行程序产生异常

andyjiang1116 commented 2 years ago

你的数据不是灰度图格式吗？img应该通道是1，如果img是3，说明你的图片还是彩色图

UBUNTUHWB commented 2 years ago

是灰度图啊，但是经过DecodeImage类这个操作之后，图像变成3通道了

andyjiang1116 commented 2 years ago

可以参考NRTR这个模型的写法， https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.4/ppocr/data/imaug/rec_img_aug.py#L46 这个是灰度图训练的模型，对应的配置文件是 https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.4/configs/rec/rec_mtb_nrtr.yml

paddle-bot-old[bot] commented 2 years ago

Since you haven\'t replied for more than 3 months, we have closed this issue/pr. If the problem is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up. It is recommended to pull and try the latest code first. 由于您超过三个月未回复，我们将关闭这个issue/pr。若问题未解决或有后续问题，请随时重新打开（建议先拉取最新代码进行尝试），我们会继续跟进。