auerswal / ssocr

Seven Segment Optical Character Recognition
https://www.unix-ag.uni-kl.de/~auerswal/ssocr/index.html
GNU General Public License v3.0
202 stars 38 forks source link

Feature Request: Need option to disambiguate some characters #4

Closed Lotharyx closed 6 years ago

Lotharyx commented 6 years ago

Different seven-segment drivers display certain arabic numerals differently. For example, the numeral six (6) is sometimes displayed without the top horizontal segment (appearing like the letter "b"). It is also common to see the numeral seven (7) displayed with or without the top left vertical segment.

Specific to me, I want to use SSOCR to recognize the displays on gas-station pumps. ssocr wants to identify the numeral 6 as the letter "b". I could run a separate operation on the output of ssocr to turn "b" into "6", of course, but it would be nice feature for ssocr to be instructed that the input consists only of numerals and no letters, for example.

Here is a sample debug image which demonstrates the issue: https://imgur.com/a/RGC0VVM

ssocr will output ".14b9" instead of "1469" (also it is identifying noise as a decimal point; will one of the options help with that?).

auerswal commented 6 years ago

Well, something like ssocr image.png | tr b 6 solves the issue with the number 6 vs a B.

Having said that, I do realize that it would be nice to have an option to tell ssocr that what looks like a B should be interpreted as a 6.

Regarding the question of noise being recognized as a decimal point, the options -n and -i are intended to help with relatively small groups of pixels, as shown on the bottom left of the linked image. The command opening can reduce or even remove small groups of pixels as well. The command remove_isolated is intended to remove individual pixels, the problematic pixel group in the linked image looks too big for remove_isolated. To remove noise from the image edges you can try the white_border command. The problematic noise in the linked image is well outside the digit area.

Anyway, you can filter any decimal point from the output using something like ssocr image.png | tr -d ..

auerswal commented 6 years ago

I have just put version 2.19.0 of ssocr on the web page https://www.unix-ag.uni-kl.de/~auerswal/ssocr/. Direct link to source: https://www.unix-ag.uni-kl.de/~auerswal/ssocr/ssocr-2.19.0.tar.bz2

It adds two new options: -c, --charset and -C, --omit-decimal-point.

If you want to detect only decimal digits and ignore decimal points, you can use: ssocr -d-1 -c decimal -C

This will recognize a six with missing top segment as 6, not b.

Lotharyx commented 6 years ago

Cool!  Thanks!

On 08/05/2018 05:15 AM, Erik Auerswald wrote:

I have just put version 2.19.0 of ssocr on the web page https://www.unix-ag.uni-kl.de/~auerswal/ssocr/ https://www.unix-ag.uni-kl.de/%7Eauerswal/ssocr/. Direct link to source: https://www.unix-ag.uni-kl.de/~auerswal/ssocr/ssocr-2.19.0.tar.bz2 https://www.unix-ag.uni-kl.de/%7Eauerswal/ssocr/ssocr-2.19.0.tar.bz2

It adds two new options: |-c, --charset| and |-C, --omit-decimal-point|.

If you want to detect only decimal digits and ignore decimal points, you can use: |ssocr -d-1 -c decimal -C|

This will recognize a six with missing top segment as |6|, not |b|.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/auerswal/ssocr/issues/4#issuecomment-410506774, or mute the thread https://github.com/notifications/unsubscribe-auth/AFXn7BQjHzG7yzgP-90I_oof_TzJ4UIXks5uNre1gaJpZM4VU-OX.