auerswal / ssocr

Seven Segment Optical Character Recognition
https://www.unix-ag.uni-kl.de/~auerswal/ssocr/index.html
GNU General Public License v3.0
202 stars 38 forks source link

OCR from Prints/Images having super-imposed Dates #12

Closed jotzet79 closed 3 years ago

jotzet79 commented 3 years ago

Dear Erik,

I'm just experimenting with your cool library, but I'm currently stuck trading to get images like the one attached to work. Do you have a clue, how I could do that?

Film7729_edit_34 JPG_si

Thanks, Joachim

auerswal commented 3 years ago

Hello Joachim,

at first glance the image looks as if using --luminance=red together with --foreground=white might help.

The threshold needs some adjustments as well, and the apostrophe is not recognized (ssocr misses a "--decimal-ratio" option to adjust the hard-coded value, otherwise it could be made to recognize it as a decimal point):

ssocr -f white -l red -t 90 115153100-784e9b00-a074-11eb-89ee-973c09088ab3.png
510896

HTH, Erik

On Sun, Apr 18, 2021 at 09:33:20AM -0700, jotzet79 wrote:

Dear Erik,

I'm just experimenting with your cool library, but I'm currently stuck trading to get images like the one attached to work. Do you have a clue, how I could do that?

Film7729_edit_34 JPG_si

Thanks, Joachim

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/auerswal/ssocr/issues/12

auerswal commented 3 years ago

I have just added options to control recognition of the decimal separator to ssocr and released this as version 2.20.0.

$ ssocr -t90 -lred -fwhite -H2 foto_mit_datum.png 
510.96
$ ssocr -t90 -lred -fwhite -H2 foto_mit_datum.png | tr . \'
510'96

It seems to me as if the dates can have a differing number of digits, thus you should probably add the option --number-digits=-1. You may as well need some heuristic to determine how many digits are used for the day and how many for the month. ssocr does not consider the amount of white space between digits, it finds just the digits, and thus throws away digit grouping information.

auerswal commented 3 years ago

Since ssocr up to version 2.20.0 throws away spacing information, but this use case needs it to distinguish between dates with single digit day & two digit month and two digit day & single digit month. Thus I have just added options to ssocr to print space characters based on the relative spacing of digits.

$ ssocr -t90 -lred -fwhite -H2 -s -G foto_mit_datum.png | tr . \'
5 10'96
$ ssocr -t90 -lred -fwhite -H2 -s -A2 foto_mit_datum.png | tr . \'
5 10'96

I have released this as ssocr version 2.21.0.

Since ssocr version 2.21.0 should provide everything required for recognition of the kind of images asked about in this issue, I intend to close the issue in a couple of days.

jotzet79 commented 3 years ago

I didn’t dare to ask honestly, but I guess my question might ignited your curiosity…

That‘s super cool, thanks a bunch! Sorry for delayed feedback - was on sick leave last week… I‘ll try it out in the next couple of days, and let you know the results!

auerswal commented 3 years ago

I hope you are better now!

Indeed your question sparked my curiosity. :-)

Both added ssocr options have been on my tentative To-Do list for quite some time, but until now I lacked a tangible use-case.

auerswal commented 3 years ago

Hello Joachim,

did you find time to try out the new ssocr features?

Thanks, Erik

auerswal commented 3 years ago

I would say the question starting this issue has been answered, and the ssocr enhancements mentioned above allow addressing the problems encountered when using ssocr with the above image. Thus I am closing this issue.