Belval / TextRecognitionDataGenerator

A synthetic data generator for text recognition
MIT License
3.29k stars 978 forks source link

Can this code generate background noise similar to scanned images(300dpi)? #80

Open sdzbft opened 5 years ago

sdzbft commented 5 years ago

i want to generate training data for text recognition of sacnned images.

Using the generated data, high accuracy can be achieved in the training set, but when I tested it on real scanned images, it didn't work well.

I wonder if the generated data might not be quite the same as the real scanned data, Can this code generate background noise similar to scanned images(300dpi)?

Thank you in advance for your help!

Belval commented 5 years ago

There's no feature for that in the project right now, but I would be interested in adding it. Can you provide sample of the type of noise that would resemble scanning artifacts?

Also, in the short term you can:

Finally, do check that your model isn't over fitting. Adding dropout in some layers and regularization could be a good way to handle this.