Closed wtester7 closed 4 years ago
The .mks file works fine here... what your OS and what's the problem?
Hi,
I've tested the MKS file and I get the same error on Win 10 x64 1909. I saw the "Tesseract returned with code" message for the first time with SE 3.5.12, so it looks like it must have something to do with Tesseract 4.1.1.
This is a serious problem because I tested a bit more and came to the same result i.e the tesseract error.
This caused me a BSOD on my Win10 because I had Photoshop opened with 70% of ram usage, plus my Vivaldi browser and various tabs opened with about 80% ram usage, I had about 1 gigabyte ram left and when I OCR'ed the .mks file and the error came, a minute later I had a BSOD.
This means there is a memory leak which over buffers the ram and then the BSOD will happen...
What happens if you run tesseract.exe via the cmd line?
I've tested on three different computers (one is a clean pc with just Visual Studio) and all are working fine... could perhaps be something with "Visual C++ Redistributable"?
To which folder / how did you install SE?
It's also working fine here. Win10 x64, 1909 Build 18363.5592 Maybe try using LSTM only?
I tried with Tesseract 4.1.0 and 4.1.1 (Original Tesseract only) on Win7. The results are identical.
OK Guys, I have pinpointed the error down: xylographe has the right setting!
The following Tesseract 4.1.1 settings in SE 3.5.13 are working properly without errors!:
If I choose LSTM Only, LSTM + Tesseract or Default, the fatal error will be triggered, only Original Tesseract only is working!!
All Tesseract 4.1.0 OCR modes in SE 3.5.11 are working without any errors!!!
@niksedk I am only using portable programs, also I have all Visual C++ installed, it is not the problem, the Tesseract 4.1.1 modes are the problem, only Original Tesseract only mode is working fine!
As you read above, vivadavid also has the same problem!
@vivadavid can you please test with these settings:
and see if it's also the only mode that is working properly for you?
Hi, using the settings suggested by @wtester7, things worked.
None of the other three modes worked.
I'm also using the portable version. As for Visual C++, I have the x86 and x64 versions of C++ 2013 and C++ 2015-2019.
I have now tested "LSTM Only" and "LSTM + Tesseravt" too. I also tried disabling "Italic" and/or "Music symbol". All modes/combinations generate acceptable results, though it would appear (first impression only) that the best result (the least errors) is generated by "Tesseract only".
Just a wild guess. Is there any chance that the problems are caused by a missing osd.traineddata
?
Nope! I'm am very sure somethings have changed in Tesseract 4.1.0 to 4.1.1 that is causing the problems.
Why is that only "Original Tesseract only" mode is working properly and the other 3 modes are defect? How do the other 3 modes differ? Why is that in Tesseract 4.1.0 all modes in SE 3.5.11 are working without any errors!?
I have an idea how to further pinpoint the problem. How about @niksedk or an other helping dev can compile Tesseract 4.1.1 for SE 3.5.11 so I can test it and see if it's working properly. We can then exclude or see if the changes that have been made since 3.5.11 to 3.5.13 are causing the problem. If the commits are not the fault then it's Tesseract 4.1.1
In the folder where you have extracted SE 3.5.11 you open the Tesseract410
folder. You rename the tesseract.exe
file in that folder to tesseract.exe.410
. Copy or link the new Tesseract 4.1.1 executable to that folder, and start SE.
Ok, so it really is the new Tesseract 4.1.1. I have just tested it and in SE 3.5.11 it is the same problem! Only the mode "Original Tesseract Only" is working fine, all other modes ( Default, LSTM Only, LSTM + Tesseract ) are crashing!
I have also tried the other way around, put Tesseract 4.1.0 in SE 3.5.13 and all modes are working fine!
For now, anyone experiencing problems with Tesseract 4.1.1, can download Tesseract410.tar.gz
, and replace the new tesseract.exe
in SE-3.5.13 (the containing folder is named Tesseract411
) with the old tessseract.exe
.
Thank you, @xylographe. I've just tried your suggestion and it works!
First, I deleted the
My next step was to restore the
SE still shows Tesseract 4.1.1, while the file tesseract.exe corresponds to Tesseract 4.1.0 and the trained data corresponds to 4.1.1. Is that correct?
@wtester7 / @vivadavid: what happens if you run tesseract 4.1.1 in the command prompt? See https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage
cmd: tesseract 00003.mks test.srt --oem 1 -l eng
Tesseract Open Source OCR Engine v4.1.1 with Leptonica Error during processing.
cmd: tesseract 00003.mks test.srt --oem 0 -l eng
Tesseract Open Source OCR Engine v4.1.1 with Leptonica Error during processing.
It doesn't like the .mks file, tried with a .jpg and it works...
@wtester7: Could you try a .png image with all engine modes in cmd line (via the "oem" parameter) ?
"--oem 0" Legacy engine only.
"--oem 1" Neural nets LSTM engine only.
"--oem 2" Legacy + LSTM engines.
"--oem 3" Default, based on what is available.
I had never used the command line with Tesseract, but I gave it a try with a PNG file containing English text.
Using oem 1, the output file was empty: tesseract image.png output --oem 1 -l eng
Using oem 0, it worked: tesseract image.png output --oem 0 -l eng
Then I tried with the MKS file.
Using either oem 0 or eom 1, I got the same error message ("Error during processing"): tesseract 00003.mks output --oem 1 -l eng tesseract 00003.mks output --oem 0 -l eng
Sorry, I've just seen your message, @niksedk
With the PNG file, I tried --oem 2 and --oem 3: in both cases, I got no error message and the file was empty.
tesseract image.png output --oem 2 -l eng tesseract image.png output --oem 3 -l eng
tesseract test.png test.srt --oem 0 -l eng = works, written ocr file! tesseract test.png test.srt --oem 1 -l eng = empty text file tesseract test.png test.srt --oem 2 -l eng = empty text file tesseract test.png test.srt --oem 3 -l eng = empty text file
Same results as @wtester7.
Just an idea to consider, I have a pretty old first generation 6-Core i7-970 CPU that is 10 years old but still rocking ( waiting for Zen3/4 to upgrade ;-)), maybe there are some new added features/changes ( since Tesseract 4.1.0 ) to the latest Tesseract 4.1.1 that is not compatible with old CPU's?
Edit: But then, why is Original Tesseract/Legacy mode working and the other 3 engine modes not?!
I found this information on https://github.com/UB-Mannheim/tesseract/wiki:
"We don't provide an installer for Tesseract 4.1.0 because we think that the latest version 5.0.0-alpha is better for most Windows users in many aspects (functionality, speed, stability). Version 4.1 is only needed for people who develop software based on the Tesseract API and who need 100 % API compatibility with version 4.0."
There's something going on with Tesseract 4.1.(x) if they don't recommend using it. The funny thing is that version 4.1.0 didn't cause any trouble, as opposed to 4.1.1
Like @wtester7, my CPU is also quite old: a first generation Intel Core i7.
Hi!
I've tested the issue on 3.5.14 and it's still there. Judging by the changelog, this was expected, but I wanted to try anyway.
Will it be fixed in a future release of Subtitle Edit or do we need to wait for the next version of Tesseract? In the meantime, it might make sense to go back to Tesseract 4.1.0, which didn't show this problem.
I think they added some cpu optimizations without checking for support in 4.1.1 - perhaps AVX (it also does run much faster than 4.1.0).
So you can use 3.02 or do the hack of copying the 4.1.0 files to the 4.1.1 folder in SE - see https://github.com/SubtitleEdit/subtitleedit/issues/3933#issuecomment-576618240
Compiling tesseract is extremely difficult.
Thank you, @niksedk . I think they have been working on Tesseract 5 for quite some time now, which will hopefully make things easier.
Does the Tesseract 5 alpha work for you? ( used the one from here: https://github.com/UB-Mannheim/tesseract/wiki ) https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.14/SubtitleEditBeta.zip
I got an error message @niksedk .
@vivadavid / @wtester7: Can you run "tesseract.exe" via command line from this package: https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-w32-setup-v5.0.0-alpha.20200223.exe ( for more info see https://github.com/UB-Mannheim/tesseract/wiki )
Hi, @niksedk
I've tried with every value for --oem and I keep getting an error message.
@vivadavid: Was that via tesseract command line and the Mannheim version?
Like tesseract a.png out -l eng
?
My guess is that some CPU optmization is enabled, that your cpu does not have...
@niksedk , I installed the programme from the link you gave me and typed commands like this:
tesseract image.png output --oem 0 -l eng
You may be right about the CPU optimization, because my laptop is very old: 1st generation Intel Core.
@vivadavid: thx for testing :)
You are welcome :-) , @niksedk
Thank you for making such a great programme!
You're welcome too :)
If you really want this fixed, try creating an issue here: https://github.com/UB-Mannheim/tesseract/issues
EDIT: Remember to include info about OS and perhaps CPU too! Cpu info tool: https://www.cpuid.com/softwares/cpu-z.html
Good idea, @niksedk
I've just done it: https://github.com/UB-Mannheim/tesseract/issues/31
@vivadavid: could you attach a screenshot from http://download.cpuid.com/cpu-z/cpu-z_1.91-en.zip ?
Should be like:
My CPU has AVX
@wtester7: How does your cpu info look like?
There you go, @niksedk . Thank you for taking the interest.
Tesseract from UB Mannheim should work on any Intel or AMD x86 compatible CPU. Anything else would be a bug.
@vivadavid, did you keep the files from the installation together, or did you move tesseract.exe
to a different directory?
@stweil , I didn't move anything.
@stweil my path is: E:\Program Files\PortableProgs\Video\Subtitle Edit\Tesseract411\tesseract.exe
But I don't think that it matters because older ocr versions doesn't have that problem...
Hi @niksedk,
just tested your newly released SE 3.5.13 & Tesseract 4.1.1 but it is impossible to OCR a mks file. With SE 3.5.11 & Tesseract 4.1.0 there is no problem when OCR'ing...
Somethings wrong with Tesseract 4.1.1 in your SE 3.5.13. I have 7zipped the fresh portable SE 3.5.13 with the mks file "00003.mks" in the root directory so you can test the problem for yourself...
https://www.upload.ee/files/11005579/SE3513.7z.html
Btw, this happens with other mks files too!
Thanks Nik!