Which data is good to train?

clovaai / deep-text-recognition-benchmark

Text recognition (optical character recognition) with deep learning methods, ICCV 2019

Apache License 2.0

3.77k stars 1.11k forks source link

Which data is good to train? #258

Open kimlia545 opened 3 years ago

kimlia545 commented 3 years ago

Paper said "This result showed that the diversity of training data can be more important than the number of training examples, and that the effects of using different training datasets is more complex than simply concluding more is better." You uesd the MJSynth and SynthText in combination. I want to train Korean language data. Should I use data with various colors, fonts, backgrounds, widths, gradients, distortions, and blurs?

yakhyo commented 3 years ago

I think using rgb images does not help because of the network input has one channel default . However you can change it by opt.rgb=True.

kimlia545 commented 3 years ago

@yakhyo Thanks

bit-scientist commented 2 years ago

Hi, @kimlia545, do you happen to have a pretrained model for Korean (or Korean + English) language that you can share? As their site only supports around 10 tests per day, I would like to have a separate model on premise. Thank you!