Linux-Mono: crash when trying to use tesseract

GoogleCodeExporter commented 9 years ago

When trying to use Tesseract for OCR on Linux, SE version 3.1 will crash.  
Attached is the error output

Original issue reported on code.google.com by hawk...@gmail.com on 1 Apr 2011 at 8:53

Attachments:

se-output-tesseract-crash.log

GoogleCodeExporter commented 9 years ago

I don't think SE works on Linux - Tesseract/video player/hunspell needs fixing 
- and I do not know enough about Linux for this :(

But I'll be happy to apply any Linux patches!

Original comment by nikse.dk@gmail.com on 1 Apr 2011 at 10:18

GoogleCodeExporter commented 9 years ago

Attached please find a patch which fixes Tesseract and disables Hunspell on 
Linux.

What it does is: If the included tesseract cannot be run, it tries to run 
tesseract with no path.  This works just fine as long as the system has a copy 
of tesseract installed.  If there is no system copy of tesseract installed, it 
will simply generate all-blank subtitles.  It also turns off UseShellExecute 
because that makes Mono try to use xdg-open which is for opening files and 
URLs.  I haven't tested that part on Windows, but it should work.

However, it only works with Tesseract 3.0, because 2.0 requires that the file 
have a three letter extension (.tif) instead of .tiff.  That would be a simple 
fix, but Tesseract 2.0 also only works with 2-, 4-, 6- or 8-bit-per-pixel tiff 
images.  The temp images that SE creates are 32-bit-per-pixel and so won't work.

Tesseract 3.0 is not yet included in Debian/Ubuntu, so it might be worth doing 
that conversion before saving the files so it will work on that very common 
distro.  But I'm not up to figuring that out, and am satisfied with installing 
Tesseract 3.0 myself.

Another thing which might be nice would be to have SE detect the missing 
Tesseract, and have it disable that option for OCR (presumably allowing 
image-recognition only)

The other half of the patch simply allows the loading of hunspell to fail.  
Naturally this means no spellchecking, and I haven't tested it thoroughly.  But 
it at least allows SE to proceed through the OCR process.

Original comment by hawk...@gmail.com on 7 Apr 2011 at 10:34

Attachments:

fix-tesseract-and-hunspell.patch

GoogleCodeExporter commented 9 years ago

Thx, applied to r387. Is it still working?

Note: I used a Utilities.IsRunningOnLinux() to detect Linux...

Do you have any idea if VLC lib can work with C#/Mono? Or should another player 
be used?

Original comment by nikse.dk@gmail.com on 8 Apr 2011 at 1:25

GoogleCodeExporter commented 9 years ago

Original comment by nikse.dk@gmail.com on 13 Apr 2011 at 6:55

Changed state: Fixed

ericvana / subtitleedit

Linux-Mono: crash when trying to use tesseract #20