Open zeromas opened 4 years ago
I have also included numbers in training images as well, but they are not sufficient, because urdu is written in RTL format while numbers are in LTR.
Why shouldn't we make a urdu number classifier in the image that scrapes the numbers in the test.jpeg and their location in which they will be put in and make another model just for numbers and when we get the numbers put the numbers back into the relative location we got before.
@UBISOFT-1 A better approach is to reverse all numbers appearance in data set (text only) then train again.
@HassamChundrigar , yeah that is indeed a better approach, why do not you train on the dataset and maybe make it so it supports multi-line ocr as well.
Thanks for highlighting. Because textual data is mainly extracted from magazines stories. There are only few examples of numbers which are not sufficient to train for numeral . There are multiple formats of writing numerals, some uses Arabic letters and some uses Mix Arabic and English letters. Multi line ocr needs segmentation of text lines from document. It may become an another module.
I am trying to OCR the images with numbers as well, Can you guide me how can I include them?