AmitGorvadiya / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

dawg.cpp #321

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.Compile project under VS 2008.
2.Copy dlls to \bin.dbg folder
3.Run " tesseract fontfile.tif fontfile.txt -l mylang 

What is the expected output? What do you see instead?
expected output is fontfile.txt in bin.dbg

E:\tessr400\tesseract-ocr\bin.dbg>tesseract fontfile.tif fontfile.txt -l mylang
unicharset_size > 0:Error:Assert failed:in file e:\tessr400\tesseract-ocr\dict\d
awg.cpp, line 140
and a windows error message is shown(dwwin.exe)
What version of the product are you using? On what operating system?

tesseract r400 downloaded from svn
Windows xp
Please provide any additional information below.

Original issue reported on code.google.com by rasm...@gmail.com on 16 Jun 2010 at 6:00

Attachments:

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
please forward tif file with <mylang>.traineddata for testing in WinXP .Whether 
tesseract.exe is of debug or release version?
-Withblessings@gmail.com

Original comment by withbles...@gmail.com on 16 Jun 2010 at 7:35

GoogleCodeExporter commented 9 years ago

Original comment by rasm...@gmail.com on 17 Jun 2010 at 7:06

Attachments:

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
soory for the late reply.tesseract.exe is of debug version..I had excluded 
unicharambigs and mal.config files from traineddata.

Original comment by rasm...@gmail.com on 17 Jun 2010 at 7:15

GoogleCodeExporter commented 9 years ago
Downloaded your tif and traineddata (all malalyalm lang.) Tested using release 
version in winxp
It is observed that your tif is of 72 dpi. I increased to 300 dpi using 
irfanview.
run as "tesseract mal.meera.01.tif test -l mal, The output "test.txt" appears 
to be is in order. I don't know malayalam but Kannada only. test.txt is 
uploaded.
It appears tesseract(debug version) has some problem when I run following your 
method - displayed exe encounter windows message (no error displayed in the CMD.
please test it with release version and also debug version. If release version 
is OK, then debug version has problem. 

Original comment by withbles...@gmail.com on 17 Jun 2010 at 8:27

Attachments:

GoogleCodeExporter commented 9 years ago
The assertion that's failing here basically checks that the unicharset has been 
loaded - that's not happening here. I'll look into it, but I suspect that this 
is because of some change in the unicharset format.

Original comment by joregan on 17 Jun 2010 at 11:31

GoogleCodeExporter commented 9 years ago
Can you test it on recent svn revision? It works for me on r521 with files you 
provided

Original comment by zde...@gmail.com on 17 Nov 2010 at 8:27

GoogleCodeExporter commented 9 years ago
@zde,
Downloaded svn r-525 in ubuntu and then transferred to winxp folder, since I do 
not know to how to  checkout svn in Winxp.
1)As desired, I checked with debug version of tesseract.exe - tested with 
phototest.tif, mal.meera.01.tif and also kan1.tif  and all outputs were found 
to be in order, clear and OK. no problem is faced by me.

2)also tested with release version of tesseract.exe. tested with phtotest.tif, 
kan1.tif - all output files were clear and OK no problem. but,for mal.tif 
failed to generate output with windows encounter message.
3) mal.meera.01.tif was tested in release version -but failed with windows 
encounter message - vide screenshot attached.
since mal.tif was of 72 dpi -increased to 200,300 using irfanview and saved as 
tif
file(uncompressed) but still generates windows encounter message . I could not 
understand why it happens for mal.tif only - whereas other tif files of other 
lang  viz phototest.tif, kan.tif works fine without any error message displayed.
With regards, 
-sriranga(78yrsold)

Original comment by withbles...@gmail.com on 18 Nov 2010 at 2:11

Attachments:

GoogleCodeExporter commented 9 years ago
Can you please try re-training with 3.01 release and than also current svn 
revision (there is (at least) one more step: shapeclustering see example 
http://code.google.com/p/tesseract-ocr/issues/detail?id=430#c7)?

Original comment by zde...@gmail.com on 22 Feb 2012 at 9:14

GoogleCodeExporter commented 9 years ago
As desired by you - I checked with version 3.01  - attached files which are 
self explanatory.
Feedback regarding= 3.02(r-679) Kindly view in the  next email.
font_ properties file gives lot of trouble. as such delay to feedback to you. 
box/tif file was generated in jboxeditor tool based on test.txt file attached  
under the issue 321.

Original comment by withbles...@gmail.com on 25 Feb 2012 at 2:50

Attachments:

GoogleCodeExporter commented 9 years ago
Zdenko,
As desired by you i also checked under new version 3.02(utpto r-679) - vide 
attached files which are self explanatory. Any more information is required?. 
for your information, I dont know malayalam script. when compared with image 
file and output file it appears to be correct except one image[26/മ] .(vide 
tesseract.log file attached. similar log was generated for version 3.01 also). 
Even edited the box file in owler no effect.- How to edit the same may kindly 
be guided for future.
With warmest Regards,
-sriranga(79yrs)

Original comment by withbles...@gmail.com on 25 Feb 2012 at 2:56

Attachments:

GoogleCodeExporter commented 9 years ago
@withblessings: report is about error is at dawg.cpp. As far as I see you did 
not bother with dawg... So this not about box editing, this is about dictionary 
creation. I would prefer somebody who know Malayalam script can test it.

Original comment by zde...@gmail.com on 25 Feb 2012 at 7:23

GoogleCodeExporter commented 9 years ago
you might check this, for me, in Bengali, it worked. 

http://www.sk-spell.sk.cx/tesseract-ocr-en-can-i-use-my-data-for-204

Original comment by sagnik1...@gmail.com on 23 May 2012 at 2:08

GoogleCodeExporter commented 9 years ago
closing based on data in Comment 12

Original comment by zde...@gmail.com on 24 Jul 2012 at 6:13