Open nipunasudha opened 7 years ago
Can you send your sample, so I can investigate?
These pre-processed images are actually photoshopped. But in the real experiment I used openCV to process. (I currently don't have them with me). Can you please improve the trained data not to limit to the font you used? I can help you find different types of seven segment samples. There are many looking for a good seven-segment OCR, yours is the best we have.
You can have a idea of what the preprocessed image should looks like, on the trainning source image: [https://github.com/arturaugusto/display_ocr/blob/master/training_source/eng.letsgodigital.exp0.tif]() Note that the segments are expected to be connected. You can achieve this by using erode with opencv. I belive that tesseract don't works good with fonts where segments that are not connected.
Just did some tests with your image printed:
I can see that 0 is kind of problematic. Don't know if other trained data solved the '0' issue without break other characters, but I will take a look. Maybe with some changes on trained data image source.
Yes please! and also can you place the decimal point after random digits in the training image? that helps recognizing the decimal point. Also a MINIMAL VERSION (with 0..9 and decimal point) of the trained data would be awesome. (tesseract can be configured to achieve that though) Your work is extremely helpful, thank you a lot.
Just a reminder, sir could you take a look to solve this issue? It would be a great help for me, and many others.
Sorry @nipunasudha. I'm quite busy right now. If you wanna see this working very soon, I recommend you to follow tesseract docs and try changing the box files to see if the issue is fixed. Right now, I can't say to you when I will be able to do this. Would be nice to also create some tests.
looking into this today.
Thank you so much!
any updates?
Try using this trained data: https://github.com/Shreeshrii/tessdata_ssd/blob/master/7seg.traineddata
I already did.
I dont want to use other people trainings, which are done by legacy Tesseract 3.X
On Thu, 14 Oct 2021 at 13:20 Artur @.***> wrote:
Try using this trained data: https://github.com/Shreeshrii/tessdata_ssd/blob/master/7seg.traineddata
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/arturaugusto/display_ocr/issues/6#issuecomment-943262504, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACUTPNKA6PDODU2TN2BF43UG24IRANCNFSM4DDQEXAA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
--
-- Dušan Panić
Senior Security Engineer
Email: @.** Mobile: +381 60 374 58 11 GitHub: https://github.com/dpanic Twitter: https://twitter.com/dusan_panic LinkedIn: *https://www.linkedin.com/in/du%C5%A1an-pani%C4%87-5933731b2/
"Tell me and I forget. Teach me and I remember. Involve me and I learn." ~ Benjamin Franklin
Also, I'm working on this: https://arturaugusto.github.io/7seg-ocr/ Maybe a alternative that dont need tesseract
Thank you Artur. I will check it out. However I must figure out how Apple does OCR on iOS, they do it pretty good.
On Sat, 23 Oct 2021 at 16:08 Artur @.***> wrote:
Also, I'm working on this: https://arturaugusto.github.io/7seg-ocr/ Maybe a alternative that dont need tesseract
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/arturaugusto/display_ocr/issues/6#issuecomment-950158400, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACUTPJITIIIV72VTJNGF7TUIK6WPANCNFSM4DDQEXAA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
--
-- Dušan Panić
Senior Security Engineer
Email: @.** Mobile: +381 60 374 58 11 GitHub: https://github.com/dpanic Twitter: https://twitter.com/dusan_panic LinkedIn: *https://www.linkedin.com/in/du%C5%A1an-pani%C4%87-5933731b2/
"Tell me and I forget. Teach me and I remember. Involve me and I learn." ~ Benjamin Franklin
7 segments is harder to OCR using trained models if you need something near 100 % accuracy, that why I'm working on analytical solutions. There is also this one (not open source): https://ocr.ipt.br/?font=7seg. It uses tesseract ocr compiled to wasm.
@arturaugusto Thanks. I already managed to get with your letsgodigital.traineddata to archive 94% accuracy. But problem is it is too slow (long story)...I perform multiple OCR's on multiple images, than I sumarize results.
I am building my own training...
Btw your https://github.com/arturaugusto/display_ocr doesn't contain all data with which you built ~10mb traineddata :-)
I am using your scripts, and Tesseract 3. I end up with 140kb file, which has 70% accuracy...Around 1000 pictures (combined) into multiple Tiffs for training.
I'm using the traineddata file from this project to recognize seven segment digits. Even for a very clear & high-resolution sample, 0 is recognized as 8. I searched through some stackoverflow, issue seems to be common. Take a look. http://stackoverflow.com/questions/30479002/digital-numbers-on-tesseract-ocr