Closed ftmasadi closed 4 months ago
This is my result
And some labels of my dataset photos: 00010.tif بهمنظور افزایش ظرفیتشان برای استفاده از اندازهگیری میکنند. همچنین میزان بهرهبرداری آنها از در فعالیتهای 00011.tif روزانه در ارتباط با دیگر گروهها مورد اندازهگیری قرار میگیرد. به طور کلی نیز شاخص
@MohieEldinMuha I think maybe I'm making a mistake in the shape of my pictures. I saw that text in your dataset photos is long. What shape did you choose? Thank you for helping me with this problem
out_char_num
should be set to W//4, W//8 or W//12, and max_text_length
is set up the same as out_char_num
.
Long Text Recognition Optimization
Recognition scenes with most texts containing more than 15 characters will be considered as long text recognition, such as the Chinese sentence level text recognition.
According to experience, Resize shape can be set according to the aspect ratio distribution of the text image during pre-process, i.e., set to the aspect ratio that can cover 90% of the data. For example, when the aspect ratio of 90% of the data is less than 15:1 after collecting the aspect ratio of the dataset, the resize shape should be set as [H, H*15]. When H is set as 32, it is [32, 480].
Where there is a text with only few characters in the data, it will cause data distortion and affect training and inference when it is resized directly to a relatively large aspect ratio. Therefore, reserve the aspect ratio first when resizing, then set H as 32. When W<480, pad 0 to W=480., when W>480, resize it to W=480 to retain the original shape of the text image.
The sequence length of SVTR output is 1*W/4 by default, i.e., SVTR samples the height of the feature down to 1, and the width down to a quarter of W. However, when we recognize long text, the width is W=480 or higher, and the sequence length is 480//4=120. In practical applications, there are few text images with 120 characters. Obviously, it is unreasonable to use the default structure configuration. Therefore, we should make appropriate adjustments to the SVTR structure, and adjust the value of out_char_num
in the SVTR structure configuration in the configuration file. According to experience, out_char_num
is generally set as 1/4, 1/8 or 1/12 of W.
In this project, W=320, and most of the text includes not more than 40 characters, so we adjust the output sequence length in the configuration file as follows:
Backbone:
name: SVTRNet
img_size:
- 32
- 320
out_char_num: 40 # W//4 or W//8 or W//12
...
In short text recognition scenes, out_char_num
is generally set as 1/4 of W; in long text recognition scenes, out_char_num
is generally set as 1/8 or 1/12 of W, which can be set according to actual needs. When W=480, out_char_num
can be set as 60 or 40. Where there is data with more than 40 characters, out_char_num
can be set as 60, or 40.
@Topdu My samples dataset has dimensions (30,200,3). How to set hyperparameters in config file? (inputs image, output model or num_char_out)
Try to reverse labels. If you read the CRNN main paper this model has an LSTM, which looks at the timestamps from left to right. as it uses CTC so it is predicting chars from left to right.
ppocr v3 arabic is not using CRNN but I think this may help .
wrong label: اکبر corrected label : ربکا
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.
It has not been updated for a long time. This issue is temporarily closed and can be reopened if necessary.
I use the following yaml file, I even spent a few days training it, but every time I change it, it doesn't give me any accuracy.Thank you for helping me, where is the problem with my work? `Global: use_gpu: true epoch_num: 5000 log_smooth_window: 20 print_batch_step: 10 save_model_dir: /content/drive/MyDrive/output save_epoch_step: 1 eval_batch_step: