AiPacino / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
2 stars 0 forks source link

tesseract - cannot open input file #1208

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.converting pdf to tif with imagemagick, command: convert -density 300 -depth 
4 "eng.Belwe Bd BT.exp0.pdf" "eng.Belwe Bd BT.exp0.tiff"
2.trying to makebox with the command: tesseract "eng.Belwe Bd BT.exp0.tif" 
"eng.Belwe Bd BT.exp0" batch.nochop makebox
3. causes error: Cannot open input file: eng.Belwe Bd BT.exp0.tif

running version V3.02 on windows 8

I'm new to this and I'm fairly doing something obvious wrong, but to point out 
what I did check: spelling, many times, the file name is spelled correctly, and 
the file is placed in my "C:\Program Files (x86)\Tesseract-OCR" directory, also 
from where I'm running the command.

I was following a guide step by step and got stuck here. I was wondering about 
the 'tiff' with 2 ff's in the convert, but only 1 f in the tesseract. But if I 
do try with 2 ff's in the tesseract command, I just get "Unsupported image 
type" error. And some Image type error I get if I change it to 1 f in the 
convert command.

am I putting my .tif file in the wrong place?
Hope someone can help, an annoying place to be stuck :)

Regards

Original issue reported on code.google.com by claeshj...@gmail.com on 23 May 2014 at 12:03

GoogleCodeExporter commented 9 years ago
Have something to add: I tried the same command with a test file (this time a 
.jpg) which worked fine being in the same directory as the .tif. So I'm 
thinking maybe there is a problem with the handling of .tif files? Or maybe a 
problem in the conversion to .tif using imagemagick(even though it gave me no 
errors).

Original comment by claeshj...@gmail.com on 23 May 2014 at 9:59

GoogleCodeExporter commented 9 years ago
Error message is "Cannot open input file" is coming for your OS - is has 
nothing to do with tesseract or image format. E.g. Windows 8 was not able to 
open file for reading... You are the only one who find why.

tiff and tif are equivalents (as jpg and jpeg). 

leptonica (library that is used for image opening) is able to work with most of 
tif format, but it would report other error message than "Cannot open input 
file" if there is problem with image format.

IMO using spaces in filename is very bad habit especially if the filename is 
used in console.

Original comment by zde...@gmail.com on 24 May 2014 at 2:25

GoogleCodeExporter commented 9 years ago
Issue 1221 has been merged into this issue.

Original comment by zde...@gmail.com on 28 May 2014 at 1:30

GoogleCodeExporter commented 9 years ago
We have run into the same problem.
It seems the issue has something to do with imagemagick. For some reason the 
generated tiff cannot be read by tesseract. 

I tried opening the image in a viewer (irfanview), which works fine. I then 
saved the image again using irfanview, with the default options. This resulted 
in a tiff half the size, which could be read by tesseract.

So I'm guessing there is something exoctic in the way imagemagick save's a 
tiff. Maybe this can be solved by changing imagemagick settings, but i'm not 
sure exactly how.

Original comment by dimitri....@gmail.com on 10 Apr 2015 at 8:04