Closed chefjuanpi closed 9 years ago
Sorry for the delay, yes tesseract does consider the whole word and has an inbuilt dictionary. It's giving you different results each time as it's got an adaptive algorithm (see their faq).
Anyway I'd ignore that for now. First I'd dissable the dictionary (see their docs, https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html - config file section) and maybe try configuring some new user-patterns. Finally you might try training you're own language using the official tools provided by tesseract its likely to improve the accuracy given the characters aren't similar to the common fonts like arial. On 27 Nov 2014 09:12, "Pablo Aguilar Lliguin" notifications@github.com wrote:
Hi
I try to use tesserac to read a car plate. I cut only the plate code, and with filters of Aforge I try to make more easy to read the image, but the text result is wear:
for example this is my plate code image in RAW:
[image: platetextraw0] https://cloud.githubusercontent.com/assets/7308580/5209330/56293276-758b-11e4-9f14-62462bce1954.jpg
after the filters, I try to process with black background
[image: plate0] https://cloud.githubusercontent.com/assets/7308580/5209396/2d880846-758c-11e4-913c-53dd5f3f9831.jpg
but I have better results with white background
[image: plate19] https://cloud.githubusercontent.com/assets/7308580/5209525/bf43e59c-758d-11e4-9e1f-eaf9ddea5373.jpg
the lecture many times are
M05 ACE
some times are H05 AEE or
AEC but nerver is the correct code M05 ACC
I don't understand why it read the 2 C with a different result one E, one C; I think tesseract try to read a word with sense, but a plate is a numbers-letters code.
It's possible configure tesseract to read only character per character and it doesn't read the words?
My code:
TesseractEngine _ocr; string tessdata = Application.StartupPath + @"\tessdata\"; _ocr = new TesseractEngine(tessdata, "eng", EngineMode.TesseractOnly);
_ocr.SetVariable("tessedit_char_whitelist", "ABCDEFGHJKLMNPQRSTVWXYZ1234567890");
private string Ocr(Bitmap image) { Pix pixplate = PixConverter.ToPix(image); var PlateText = _ocr.Process(pixplate); string text = ""; text = text + PlateText.GetText(); PlateText.Dispose(); return text; }
— Reply to this email directly or view it on GitHub https://github.com/charlesw/tesseract/issues/139.
Hi
I try to use tesserac to read a car plate. I cut only the plate code, and with filters of Aforge I try to make more easy to read the image, but the result is wear:
for example this is my plate code image in RAW:
after the filters, I try to process with black background
I have better results with white background
the lecture many times are
M05 ACE
some times are H05 AEE or
AEC but nerver is the correct code
in first time I think make a training but, I don't understand why it read the 2 C with a different result one E, one C; I think tesseract try to read a word with sense, but a plate is a numbers-letters code.
It's possible configure tesseract to read only character per character or similar?
My code:
TesseractEngine _ocr; string tessdata = Application.StartupPath + @"\tessdata\"; _ocr = new TesseractEngine(tessdata, "eng", EngineMode.TesseractOnly);
_ocr.SetVariable("tessedit_char_whitelist", "ABCDEFGHJKLMNPQRSTVWXYZ1234567890");
private string Ocr(Bitmap image) { Pix pixplate = PixConverter.ToPix(image); var PlateText = _ocr.Process(pixplate); string text = ""; text = text + PlateText.GetText(); PlateText.Dispose(); return text; }