Shreeshrii / tessdata_ssd

Tesseract 4 traineddata for recognizing Seven Segment Display
Apache License 2.0
49 stars 7 forks source link

Better results with 7seg.traineddata #1

Open Shreeshrii opened 5 years ago

Shreeshrii commented 5 years ago

Test of a small sample of real life images gives better results with the older 7seg.traineddata. Unfortunately I have deleted the source training_text for the same. Customizing the training text and fonts based on the requirements as well as preprocessing the images to reduce the gaps will lead to better recognition.

ssd202

* ssd202 LANG 7seg TESSDATA tessdata_ssd OEM 1 PSM 6 Failed to load any lstm-specific dictionaries for lang 7seg!! 2.02

* ssd202 LANG ssd TESSDATA tessdata_ssd OEM 1 PSM 6 1 2.02

ssd1

* ssd1 LANG 7seg TESSDATA tessdata_ssd OEM 1 PSM 6 Failed to load any lstm-specific dictionaries for lang 7seg!! 22.0

* ssd1 LANG ssd TESSDATA tessdata_ssd OEM 1 PSM 6 22.12

ssd2

* ssd2 LANG 7seg TESSDATA tessdata_ssd OEM 1 PSM 6 Failed to load any lstm-specific dictionaries for lang 7seg!! 29: L. I1.0

* ssd2 LANG ssd TESSDATA tessdata_ssd OEM 1 PSM 6 29 11.0

ssd3

* ssd3 LANG 7seg TESSDATA tessdata_ssd OEM 1 PSM 6 Failed to load any lstm-specific dictionaries for lang 7seg!! 4:05:30

* ssd3 LANG ssd TESSDATA tessdata_ssd OEM 1 PSM 6 801

ssd4

* ssd4 LANG 7seg TESSDATA tessdata_ssd OEM 1 PSM 6 Failed to load any lstm-specific dictionaries for lang 7seg!! 10.5°

* ssd4 LANG ssd TESSDATA tessdata_ssd OEM 1 PSM 6 10.5°

ssd5

* ssd5 LANG 7seg TESSDATA tessdata_ssd OEM 1 PSM 6 Failed to load any lstm-specific dictionaries for lang 7seg!! 4:05:30

* ssd5 LANG ssd TESSDATA tessdata_ssd OEM 1 PSM 6 4:05:30

ssd6

* ssd6 LANG 7seg TESSDATA tessdata_ssd OEM 1 PSM 6 Failed to load any lstm-specific dictionaries for lang 7seg!! 10.5°

* ssd6 LANG ssd TESSDATA tessdata_ssd OEM 1 PSM 6 10.5°

ssd7

* ssd7 LANG 7seg TESSDATA tessdata_ssd OEM 1 PSM 6 Failed to load any lstm-specific dictionaries for lang 7seg!! 29:

* ssd7 LANG ssd TESSDATA tessdata_ssd OEM 1 PSM 6 11

ssd8

* ssd8 LANG 7seg TESSDATA tessdata_ssd OEM 1 PSM 6 Failed to load any lstm-specific dictionaries for lang 7seg!! 05:54:09

* ssd8 LANG ssd TESSDATA tessdata_ssd OEM 1 PSM 6 05:54:09

ssd9

* ssd9 LANG 7seg TESSDATA tessdata_ssd OEM 1 PSM 6 Failed to load any lstm-specific dictionaries for lang 7seg!! 7:45

* ssd9 LANG ssd TESSDATA tessdata_ssd OEM 1 PSM 6 7:45

Shreeshrii commented 5 years ago

ssdPicture1

* ssdPicture1 LANG 7seg TESSDATA tessdata_ssd OEM 1 PSM 6 Failed to load any lstm-specific dictionaries for lang 7seg!! CO2

* ssdPicture1 LANG ssd_alphanum_plus TESSDATA tessdata_ssd OEM 1 PSM 6 C02

ssdPicture2

* ssdPicture2 LANG 7seg TESSDATA tessdata_ssd OEM 1 PSM 6 Failed to load any lstm-specific dictionaries for lang 7seg!! E09

* ssdPicture2 LANG ssd_alphanum_plus TESSDATA tessdata_ssd OEM 1 PSM 6 83

ssdPicture3

* ssdPicture3 LANG 7seg TESSDATA tessdata_ssd OEM 1 PSM 6 Failed to load any lstm-specific dictionaries for lang 7seg!! 171171 171 D0

* ssdPicture3 LANG ssd_alphanum_plus TESSDATA tessdata_ssd OEM 1 PSM 6 0 0 0 1

Same image as above, but with blur, greyscale and convert to black and white to remove gaps

ssdPicture3-bw

* ssdPicture3-bw LANG 7seg TESSDATA tessdata_ssd OEM 1 PSM 6 Failed to load any lstm-specific dictionaries for lang 7seg!! 888

* ssdPicture3-bw LANG ssd_alphanum_plus TESSDATA tessdata_ssd OEM 1 PSM 6 888

wahid18benz commented 5 years ago

thank you for this great work, can you show how I can test it for my dataset, what file should I Download and how to use it ? Thank you @Shreeshrii

Shreeshrii commented 5 years ago

wget https://github.com/Shreeshrii/tessdata_ssd/raw/master/7seg.traineddata

Copy the traineddata file to your tessdata-dir (where other traineddata files are).

Check with

tesseract --list-langs

Use with

-l 7seg

Similarly for the other trained data files.

wahid18benz commented 5 years ago

Thank you @Shreeshrii you 've done a great work, how I can train the model with my own dataset, I have a large dataset of multimeter seven segment, but it's not annotated, could you help me for the procedure for training and the different tools that I need ? I have seen https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 but it's not too clear for me. Thanks,

Shreeshrii commented 5 years ago

see https://github.com/Shreeshrii/tessdata_ssd and https://github.com/Shreeshrii/tessdata_ssd/blob/master/finetune.sh

Modify and run with your text.

Shreeshrii commented 5 years ago

For training with images see https://github.com/OCR-D/ocrd-train