JohnWang0512 / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

persian (farsi) #776

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
would you please add Farsi language to your tool?
if any one send Arabic image that you use for training I will make the training 
file because Persian and Arabic difference only is in 4 characters گ پ چ ژ

Original issue reported on code.google.com by reza.mos...@gmail.com on 16 Oct 2012 at 11:10

GoogleCodeExporter commented 9 years ago
Hi! I am also ready for collaborating on making tesseract compatible with 
Persian (fa) language. Also please note that other than these characters, ۴ ۵ 
۶ (4 5 6 in Persian digits) is a bit different from their Arabic counterparts 
but Arabic-indic digits are also acceptable in Persian script.

Original comment by ebra...@byagowi.com on 16 Oct 2012 at 11:34

GoogleCodeExporter commented 9 years ago
Hi guys, 

could you also throw this in, in addition to the persian letters? ڭ . 

Anything new happening with Arabic OCR?

Original comment by j...@christianmissiontrips.org on 1 Nov 2012 at 9:25

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
we started making Persian data training on 
https://github.com/reza1615/PersianOcr 

Original comment by reza.mos...@gmail.com on 5 Nov 2012 at 7:42

GoogleCodeExporter commented 9 years ago
Here is latest tries of reza's works with Tesseract 3.02.2 that I put on 
github: 
https://github.com/reza1615/PersianOcr/tree/master/Sample%20Test%20of%20Latest%2
0Version
Looks promising but we think there is some hidden hints and secrets on training 
tesseract on Arabic script. I believe Google's documentations are very poor 
about notes that we must consider for training and this is not cool for an 
open-source project. For example it is very very helpful if you publish Arabic 
source files that you used for training tesseract in cube method that you used 
for Arabic.

Original comment by ebra...@byagowi.com on 5 Nov 2012 at 8:37

GoogleCodeExporter commented 9 years ago
Hi,
If you gonna make farsi for us , in addition to letter differences between 
farsi and arabic that my friends said in above comments , I want to add another 
consideration:
in farsi we doesn`t have letter:ي
instead we have: ی

Thanks

Original comment by abidiash...@gmail.com on 21 Dec 2012 at 8:28

GoogleCodeExporter commented 9 years ago
I'm ready to collaborate in this project too.
Total numbers of Persian speakers are more than 110 million.
Also there isn't any other OCR for it.

Original comment by intelsat...@gmail.com on 6 Jan 2013 at 1:01

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
reza man mikham train konam tesseract ro mituni maraheleshu behem begi

Original comment by amir...@gmail.com on 25 Sep 2013 at 10:20

GoogleCodeExporter commented 9 years ago
Hallo guys,

I'm trying to train Tesseract for Kurdish, this is good too for the Persian, 
Kurdish has some more other letters, but the way of writing is the same as 
Arabic or Farsi. The problem I'm getting is that the final OCR result is not 
from right to left, but from left to right, which means that u can't read the 
text, but the letters r correct. I use  qt-box-editor to edit the box, then I 
use Serak tesseract Trainer V0.4 to train the OCR, after all I put the 
Traineddata file in the Tesseract dir., every thing goes well except the 
missing Arabic mechanism of writing from right to left.

Does any body know this peoblem?

You could see the traineddata file I generated as an attachment.

Thanks alot

Original comment by karo0...@gmail.com on 18 Oct 2013 at 7:27

Attachments:

GoogleCodeExporter commented 9 years ago
Hello,

It seems to train Arabic and Farsi languages with good precision you need train 
cube engine of tesseract. Do you know how cube engine could be trained ? Main 
programmer of cube engine is Ahmad Abdulkader, now memeber of facebook company! 

Original comment by vahid.ke...@gmail.com on 5 Jul 2014 at 11:34

GoogleCodeExporter commented 9 years ago
@ Vahid. I tried to use cube engine but it doesn't have any help or manual so I 
couldn't train perfectly. I sent many emails to ocr developers but nobody 
answered!

Original comment by reza.mos...@gmail.com on 6 Jul 2014 at 8:19

GoogleCodeExporter commented 9 years ago
سلام

افراد دیگری (بجز برنامه نویسانش) سعی کرده 
اند که کمی از این  cube سر در بیاورند ، که 
نتایج آنرا اینجا نوشته اند:
https://code.google.com/p/tesseract-ocr-extradocs/
البته من که چیزی سر درنیاوردم.

Original comment by abidiash...@gmail.com on 6 Jul 2014 at 8:43

GoogleCodeExporter commented 9 years ago
سلام مجدد
لینک داده شده را مطالعه کردم متاسفانه فقط 
معرفی کردند و روش ساخت را نگفته‌اند در 
نتیجه قابل استفاده نیست و بسیاری از جاها 
حتی معرفی دقیق هم انجام نداده‌اند
در یکی از لینک‌ها مطرح کرده که این روش به 
دلیل متن‌باز نبودن از کور برنامه حذف 
شده‌است

Original comment by reza.mos...@gmail.com on 6 Jul 2014 at 9:19

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Ability to train the tesseract recognizer (but not cube) on several 
Arabic-based languages will be added to 3.04, and this problem may receive real 
attention for 3.05.

Original comment by theraysm...@gmail.com on 4 Nov 2014 at 7:03

GoogleCodeExporter commented 9 years ago
does anybody know when tesseract 3.04 comes ? indeed i cloned reza's project 
and make training. Then i put per.traineddata to tessdata. But it didn't 
worked. does any body send me tested copy of per.traineddata ?

thanks in advance.

Original comment by e.velib...@gmail.com on 3 Feb 2015 at 2:44