Open macqueen09 opened 3 years ago
The validation set is the same distribution of the training set.(with Simulation by myself) The test set is the same distribution of the real data in the training set (without Simulation)
Maybe your simulation is too far from the real data? Do you have an example of both? You can create your own Dataset class in order to augment data "on the fly" and use imgaug for data augmentation for example
this is a Simulation img, combine Print and handwriting 。 handwriting part was come from a lib which has almost 20000 small img. Print was from 20 fonts. Simulation was been add some imgaug such as noise and some Distorted shape(噪声+小的形变) and this is real img:
real img was also combine print and handwriting, which was written by student @varshaneya
yes Simulation is still easer then real img. thats the problem. but 5W real img is can only accrive 80% acc. sad
@macqueen09 i need make some data augmentation on the fly, could you give me some indications of how could i do it? thanks
https://github.com/aleju/imgaug @mjack3 imgaug is a toolbox for normal augmentation boxs
I hace almost 25W images to digital recognition by hand , 5W are real and others are Simulation。 It overfitting very fast. both adam(lr = 0.001) and Adadelta(lr = 1) are overfitting very fast. both None_ResNet_None_Attn_adam or TPS_Res_BLSTM_CTC_adam or else. after 5epochs, (isogeny)eval acc 97% when test acc 65%(and it could go up slowly to 80%) could you have some Training strategy?
another question , I couldn't find the code for Data Augmentation. Should I add them myself? Thanks very much