PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
42.42k stars 7.65k forks source link

Getting zero accuracy acc = 0 #9000

Closed ftmasadi closed 4 months ago

ftmasadi commented 1 year ago

I use the following yaml file, I even spent a few days training it, but every time I change it, it doesn't give me any accuracy.Thank you for helping me, where is the problem with my work? `Global: use_gpu: true epoch_num: 5000 log_smooth_window: 20 print_batch_step: 10 save_model_dir: /content/drive/MyDrive/output save_epoch_step: 1 eval_batch_step:

ftmasadi commented 1 year ago

This is my result 2 3

ftmasadi commented 1 year ago

And some labels of my dataset photos: 00010.tif به‌منظور افزایش ظرفیتشان برای استفاده از اندازه‌گیری می‌کنند. همچنین میزان بهره‌برداری آن‌ها از در فعالیت‌های 00011.tif روزانه در ارتباط با دیگر گروه‌ها مورد اندازه‌گیری قرار می‌گیرد. به طور کلی نیز شاخص

ftmasadi commented 1 year ago

@MohieEldinMuha I think maybe I'm making a mistake in the shape of my pictures. I saw that text in your dataset photos is long. What shape did you choose? Thank you for helping me with this problem

Topdu commented 1 year ago

out_char_num should be set to W//4, W//8 or W//12, and max_text_length is set up the same as out_char_num.

Long Text Recognition Optimization

Recognition scenes with most texts containing more than 15 characters will be considered as long text recognition, such as the Chinese sentence level text recognition.

According to experience, Resize shape can be set according to the aspect ratio distribution of the text image during pre-process, i.e., set to the aspect ratio that can cover 90% of the data. For example, when the aspect ratio of 90% of the data is less than 15:1 after collecting the aspect ratio of the dataset, the resize shape should be set as [H, H*15]. When H is set as 32, it is [32, 480].

Where there is a text with only few characters in the data, it will cause data distortion and affect training and inference when it is resized directly to a relatively large aspect ratio. Therefore, reserve the aspect ratio first when resizing, then set H as 32. When W<480, pad 0 to W=480., when W>480, resize it to W=480 to retain the original shape of the text image.

Adjust the output sequence length

The sequence length of SVTR output is 1*W/4 by default, i.e., SVTR samples the height of the feature down to 1, and the width down to a quarter of W. However, when we recognize long text, the width is W=480 or higher, and the sequence length is 480//4=120. In practical applications, there are few text images with 120 characters. Obviously, it is unreasonable to use the default structure configuration. Therefore, we should make appropriate adjustments to the SVTR structure, and adjust the value of out_char_num in the SVTR structure configuration in the configuration file. According to experience, out_char_num is generally set as 1/4, 1/8 or 1/12 of W.

In this project, W=320, and most of the text includes not more than 40 characters, so we adjust the output sequence length in the configuration file as follows:

  Backbone:
    name: SVTRNet
    img_size:
    - 32
    - 320
    out_char_num: 40 # W//4 or W//8 or W//12
    ...

In short text recognition scenes, out_char_num is generally set as 1/4 of W; in long text recognition scenes, out_char_num is generally set as 1/8 or 1/12 of W, which can be set according to actual needs. When W=480, out_char_num can be set as 60 or 40. Where there is data with more than 40 characters, out_char_num can be set as 60, or 40.

Aminfaraji commented 1 year ago

@Topdu My samples dataset has dimensions (30,200,3). How to set hyperparameters in config file? (inputs image, output model or num_char_out)

masoudMZB commented 1 year ago

Try to reverse labels. If you read the CRNN main paper this model has an LSTM, which looks at the timestamps from left to right. as it uses CTC so it is predicting chars from left to right.

ppocr v3 arabic is not using CRNN but I think this may help .

wrong label: اکبر corrected label : ربکا

github-actions[bot] commented 7 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

UserWangZz commented 4 months ago

It has not been updated for a long time. This issue is temporarily closed and can be reopened if necessary.