Closed GoogleCodeExporter closed 9 years ago
Hi,
The recognition of the text in OCRFeeder depends on the engine you use.
If you use Tesseract, edit it in Tools->OCR Engines and add this to end of the
Arguments field:
-l de
That should do the job.
Original comment by joaquimr...@gmail.com
on 25 Jan 2010 at 7:51
Hi!
Thanks for your answer, but it doesn't work. The actual option for german in
tesseract-ocr is
-l deu
but the umlaute aren't recognized that way. gocr doesn't show umlaute either.
I'll check with the new version asap.
Any ideas how cuneiform-linux could be used? (But that's probably worth another
Issue)
so long
hank
Original comment by hanksch...@googlemail.com
on 25 Jan 2010 at 10:24
Hi again!
No change with version 0.6. - an example:
Akzeptanz von jugendlichen im öffentlichen Raum geworben. Ãber den
Förderfonds werden gezielt jugendprojekte entwickelt und unterstützt.
It should read:
Akzeptanz von jugendlichen im Öffentlichen Raum geworben. Über den
Förderfonds werden gezielt jugendprojekte entwickelt und unterstützt.
This is what I used in the arguments field:
$IMAGE $FILE -l deu ; cat $FILE.txt
Any change to fix that? OCRfeeder is using UTF-8, if I read the code correctly,
is
there a way to use something else instead?
so long
hank
Original comment by hanksch...@googlemail.com
on 26 Jan 2010 at 4:40
Hi,
I think you may be right. It might be some problem with the encoding when
reading the
file or displaying the contents. The UTF-8 though, should not be the problem.
I'll address that as soon as I can.
Thank you,
Original comment by joaquimr...@gmail.com
on 26 Jan 2010 at 4:44
Hi hank,
Could you please attach a file with an example of the text with umlauts that is
failing for you so I can focus on a real example and fix it?
Thank you,
Original comment by joaquimr...@gmail.com
on 3 Mar 2010 at 2:07
Hi Hank,
Even though you haven't sent the file I asked for in my previous message, I
think I
have fixed the issue. Turns out that the encoding of the text was working only
for
Ocrad and not working well for any other engine.
I fixed this and now the engines output is supposed to be in UTF-8. (many
engines
allow a parameter to set this)
This will be available on the next release.
Original comment by joaquimr...@gmail.com
on 4 Mar 2010 at 2:42
Hi!
Sorry, didn't get to it yesterday - do you still need it?
Would be great if you could fix that one! You've got some svn-version for
testing?
so long
hank
Original comment by hanksch...@googlemail.com
on 4 Mar 2010 at 4:33
Hi Hank,
I got a git version :)
http://git.gnome.org/browse/ocrfeeder/
Let me know if this version already works for you.
Original comment by joaquimr...@gmail.com
on 4 Mar 2010 at 4:36
i!
I've found it, but it doesn't work on my new machine (AMD quad-core, usung
Ubuntu
9.10, 32.bit version).
I get
Traceback (most recent call last):
File "/usr/lib/pymodules/python2.6/studio/widgetModeler.py", line 370, in
performBoxDetection
self.performBoxDetectionForReviewer(image_reviewer)
File "/usr/lib/pymodules/python2.6/studio/widgetModeler.py", line 384, in
performBoxDetectionForReviewer
image_processor = ImageProcessor(image_reviewer.path_to_image, window_size)
File "/usr/lib/pymodules/python2.6/feeder/imageManipulation.py", line 40, in __init__
raise ImageManipulationError
feeder.imageManipulation.ImageManipulationError
with both 0.6.0, and the git.version - something missing? I'll try on my "old"
maschine, at least 0.6 was working (well, kind of ;-) )
so long
hank
Original comment by hanksch...@googlemail.com
on 4 Mar 2010 at 5:37
Hi,
Could you please attach the file you are trying to recognize? (so I can check
if I
get the same error)
Original comment by joaquimr...@gmail.com
on 4 Mar 2010 at 6:04
Hi!
This is rather weird...
I gave it a try on the old machine, and it looks like ocrfeeder doesn't like my
standard.tif-scans prouced by xsane... png-pictures work out just fine! Great!
All
umlauts are recognised!
I attach the non-working .tiff, and the same in png.
so long
hank
Original comment by hanksch...@googlemail.com
on 4 Mar 2010 at 6:48
Attachments:
Opps, took the same twice - here's the png ...
Original comment by hanksch...@googlemail.com
on 4 Mar 2010 at 6:50
Attachments:
Hi Hank,
So, the problem was that the tiff image you're using is encoded in a way that
is not
supported by Python's Imaging Library and when OCRFeeder attempts to open it,
it will
give that error.
I tried converting it to "tiff" (I know...) using ImageMagick and it then
works, IM
might use a different compressing algorithm.
To convert this image the way I did, simply enter:
$ convert test0001.tiff right_test.tiff
It's amazing that all the images I tried with OCRFeeder and all users
considered, no
such error has ever been reported. I wonder how you are creating that image.
I'll close this as fixed because I have fixed the umlauts cases and the image
format
problem is not a common use case, also one can always convert the images.
Nonetheless, when such occurs now, it will popup a warning dialog telling the
user
that an error occurred and that the image used should be converted to an
appropriate
format.
Cheers,
Original comment by joaquimr...@gmail.com
on 5 Mar 2010 at 2:11
I mistakenly set it as won't fix...
Setting as fixed now..
Original comment by joaquimr...@gmail.com
on 5 Mar 2010 at 2:12
Hi!
The tiff was a standard scan from xsane; actually I had problems with that
format
before, using tesseract, and had to convert those files, too (I trained
tesseract to
recognise an old latin-german dictionary).
I thought that was a tesseract-problem only, but the same message appeared
trying to
use ocrad as engine for ocrfeeder.
thanks!
so long
hank
Original comment by hanksch...@googlemail.com
on 5 Mar 2010 at 4:09
Original issue reported on code.google.com by
hanksch...@googlemail.com
on 6 Dec 2009 at 5:44