danpla / dpscreenocr

Program to recognize text on screen
https://danpla.github.io/dpscreenocr/
zlib License
241 stars 18 forks source link

Does not recognize numbers #13

Closed manouchk38 closed 3 years ago

manouchk38 commented 3 years ago

When using dpScreenOCR, it does not recocgnize numbers. dpScreenOCR should or could include recocgnition of numbers when numbers apperas inside a text or for the case in which the whole ocred image is a number.

danpla commented 3 years ago

Hello. Can you give an example image you have issues with, and tell what languages you use for OCR?

Here's an example from https://en.wikipedia.org/wiki/List_of_numbers#SI_prefixes 2021-07-19-095741_1600x900_scrot_cr And this is the text recognized with Tesseract 3.03:

One important use of integers is in orders of magnitude. A power of 10 is a number 10", where k is an integer. For instance, with k = 0, 1, 2, 3, the appropriate powers of ten are 1, 10, 100, 1000, Powers of ten can also be fractional: for instance, k = -3 gives 1/1000, or 0.001. This is used in scientific notation, real numbers are written in the form m x 10". The number 394,000 is written in this form as 3.94 x 105.

... and with Tesseract 4.0.0:

One important use of integers is in orders of magnitude. A power of 10 is a number 10K, where k is an integer. For instance, with k = 0, 1, 2, 3, ..., the appropriate powers of ten are 1, 10, 100, 1000, ... Powers of ten can also be fractional: for instance, k = -3 gives 1/1000, or 0.001. This is used in scientific notation, real numbers are written in the form m x 10”. The number 394,000 is written in this form as 3.94 x 10°.

manouchk38 commented 3 years ago

Well, I'm a bit confused. I tried with something similar to this: [image: image.png]

And it did not work. I tried now and it worked. I guess this issue should be closed. I don't understand what happened.

Em seg., 19 de jul. de 2021 às 04:18, Daniel Plakhotich < @.***> escreveu:

Hello. Can you give an example image you have issues with, and tell what languages you use for OCR?

Here's an example from https://en.wikipedia.org/wiki/List_of_numbers#SI_prefixes [image: 2021-07-19-095741_1600x900_scrot_cr] https://user-images.githubusercontent.com/10910745/126117658-ff79657e-dd71-4547-a362-9862b08ecfed.png And this is the text recognized with Tesseract 3.03:

One important use of integers is in orders of magnitude. A power of 10 is a number 10", where k is an integer. For instance, with k = 0, 1, 2, 3, the appropriate powers of ten are 1, 10, 100, 1000, Powers of ten can also be fractional: for instance, k = -3 gives 1/1000, or 0.001. This is used in scientific notation, real numbers are written in the form m x 10". The number 394,000 is written in this form as 3.94 x 105.

... and with Tesseract 4.0.0:

One important use of integers is in orders of magnitude. A power of 10 is a number 10K, where k is an integer. For instance, with k = 0, 1, 2, 3, ..., the appropriate powers of ten are 1, 10, 100, 1000, ... Powers of ten can also be fractional: for instance, k = -3 gives 1/1000, or 0.001. This is used in scientific notation, real numbers are written in the form m x 10”. The number 394,000 is written in this form as 3.94 x 10°.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/danpla/dpscreenocr/issues/13#issuecomment-882305710, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATB3CUP52ZIWRLN6M3Q5G5LTYPGV3ANCNFSM5ASYEDWA .