Shreeshrii / tessdata_shreetest

finetuned traineddata files for tesseract 4.0.0 for testing
156 stars 32 forks source link

Accounting\currency version #9

Open oldominion opened 5 years ago

oldominion commented 5 years ago

Could you possibly add a traindata file specialized for accounting purposes? 1-9, dot, comma, various currency symbols such as '$£€', dash, colon/semicolon, etc

€ is the main problem for me, it's invariably detected as a 6 or an 8 instead of being ignored, and since I'm looking for digits only, I have no way of correcting the output via post processing.

Shreeshrii commented 5 years ago

Please see https://github.com/tesseract-ocr/tessdata/pull/120

oldominion commented 5 years ago

Thanks! Sadly that one doesn't include currency symbols, so it's no use at avoiding the frequent misclassification of € for example.

Shreeshrii commented 5 years ago

If you can make a training text with the kind of symbols you need, I can run the training.

See samples of training text used for other traineddata:

https://github.com/Shreeshrii/tessdata_shreetest/blob/master/eng.digits.training_text https://github.com/Shreeshrii/tessdata_shreetest/blob/master/engrestrict.training_text

samrood121 commented 3 years ago

@Shreeshrii is there any way that you can help me make a traineddata for single image (it is basically a check box) for my project