clovaai / deep-text-recognition-benchmark

Text recognition (optical character recognition) with deep learning methods, ICCV 2019
Apache License 2.0
3.75k stars 1.1k forks source link

Overfitting quickly vs. Data Augmentation #244

Open macqueen09 opened 3 years ago

macqueen09 commented 3 years ago

I hace almost 25W images to digital recognition by hand , 5W are real and others are Simulation。 It overfitting very fast. both adam(lr = 0.001) and Adadelta(lr = 1) are overfitting very fast. both None_ResNet_None_Attn_adam or TPS_Res_BLSTM_CTC_adam or else. after 5epochs, (isogeny)eval acc 97% when test acc 65%(and it could go up slowly to 80%) could you have some Training strategy?

another question , I couldn't find the code for Data Augmentation. Should I add them myself? Thanks very much

macqueen09 commented 3 years ago

The validation set is the same distribution of the training set.(with Simulation by myself) The test set is the same distribution of the real data in the training set (without Simulation)

v-retoux commented 3 years ago

Maybe your simulation is too far from the real data? Do you have an example of both? You can create your own Dataset class in order to augment data "on the fly" and use imgaug for data augmentation for example

macqueen09 commented 3 years ago

00000 00003

this is a Simulation img, combine Print and handwriting 。 handwriting part was come from a lib which has almost 20000 small img. Print was from 20 fonts. Simulation was been add some imgaug such as noise and some Distorted shape(噪声+小的形变) and this is real img:

00000 real img was also combine print and handwriting, which was written by student @varshaneya 00006

macqueen09 commented 3 years ago

yes Simulation is still easer then real img. thats the problem. but 5W real img is can only accrive 80% acc. sad

mjack3 commented 3 years ago

@macqueen09 i need make some data augmentation on the fly, could you give me some indications of how could i do it? thanks

macqueen09 commented 3 years ago

https://github.com/aleju/imgaug @mjack3 imgaug is a toolbox for normal augmentation boxs