Urdu Text Does Not Get Detected

Aeyxen commented 10 months ago

First things first, sincere appreciation for your outstanding work in developing this incredible AI-driven OCR library. It's a fantastic tool that holds immense potential for digital humanities, I am a student of this subject.

I started my testing with some old Urdu historical documents, and unfortunately, I didn't observe any bounding box (Bbox) detection for the Urdu text within those documents.

Subsequently, I tested it with an image that contains a mix of Hindi, English, and Urdu text. To my delight, it successfully detected the Hindi and English portions of the text. However, it only recognized one line of the Urdu text, which was less than expected. I have attached the image for your reference so that you can better understand the scenario.

image5-602w291h_0_bbox

VikParuchuri commented 10 months ago

Try the new code/model - pip install -U surya

VikParuchuri commented 10 months ago

This seems to work

and

You may need to experiment with the threshold settings to detect more text (see README)

Aeyxen commented 8 months ago

Noted with thanks

VikParuchuri / surya

Urdu Text Does Not Get Detected #6