deanmalmgren / textract

extract text from any document. no muss. no fuss.
http://textract.readthedocs.io
MIT License
3.89k stars 599 forks source link

Extract text from image "ticket" #319

Closed hanane2019 closed 4 years ago

hanane2019 commented 4 years ago

example_04

Hello everybody,

I’ve used many python libraries to extract the text from these tickets but they did not give me correct results. I even applied a filter and a preprocessing on these images but I could not extract all the text from them, especially the elements that want to extract are the date and the time and the amount TTC and the serial number located in the middle of the ticket like 31950 A 34. Could anyone please help me with this?

jpweytjens commented 4 years ago

Textract is a wrapper for other tools. It doesn't implement any text extraction methods itself. I would recommend following the Tesseract advice for better extraction and go from there. I'm closing this issue as it's not directly related to textract.