SubtitleEdit / subtitleedit

the subtitle editor :)
http://www.nikse.dk/SubtitleEdit/Help
GNU General Public License v3.0
8.52k stars 895 forks source link

SE 3.5.13 & Tesseract 4.1.1 OCR Bug ( -1073741795 error ) #3933

Closed wtester7 closed 4 years ago

wtester7 commented 4 years ago

Hi @niksedk,

just tested your newly released SE 3.5.13 & Tesseract 4.1.1 but it is impossible to OCR a mks file. With SE 3.5.11 & Tesseract 4.1.0 there is no problem when OCR'ing...

Somethings wrong with Tesseract 4.1.1 in your SE 3.5.13. I have 7zipped the fresh portable SE 3.5.13 with the mks file "00003.mks" in the root directory so you can test the problem for yourself...

https://www.upload.ee/files/11005579/SE3513.7z.html

Btw, this happens with other mks files too!

Thanks Nik!

wtester7 commented 4 years ago

@stweil I have also tried the new OCR 5 Alpha from: https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-w32-setup-v5.0.0-alpha.20200223.exe I have the same problem like @vivadavid

Tesseract_5_Alpha_Error

niksedk commented 4 years ago

@wtester7 / @vivadavid : Could you try this new version and see if you still get the crash? https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-w32-setup-v5.0.0-alpha.20200328.exe

vivadavid commented 4 years ago

Hi, @niksedk

I get a different kind of error:

Error: Tesseract (legacy) engine requested, but components are not present in C:\Program Files (x86)\Tesseract-OCR/tessdata/eng.traineddata!!

The file is certainly there, so the only problem I can think of is that Tesseract, in this alpha, can't access the subfolder because it's using '/' instead of '\'.

Is there a parameter I can use to tell Tesseract where the trained data is?

niksedk commented 4 years ago

Try this traineddata file: https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata Contains both legacy and new LSTM. Any better?

Also uploaded a new beta using latest version of Tesseract from Stefan: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.14/SubtitleEditBeta.zip

vivadavid commented 4 years ago

Thank you, @niksedk

Now everything works. All 4 modes running directly Tesseract.exe and also running your beta of SE. I'm very happy! Thanks again!

niksedk commented 4 years ago

Thank @stweil :)

wtester7 commented 4 years ago

@niksedk and @stweil , yes the new version ( Tesseract 5.00 2020-03-28 Alpha ) is working well! Thank you for the fix!!

wtester7 commented 4 years ago

Btw, Tesseract 5.00 2020-03-28 Alpha is also much faster than 4.1.1, can't wait for 5.0 Final! Thank you again!!

vivadavid commented 4 years ago

Sorry, I forgot to thank you, @stweil

stweil commented 4 years ago

You are welcome. Thanks for reporting severe problems like this one. Best regards from @UB-Mannheim.

OmrSi commented 4 years ago

Wow. It really is very fast. And it's much better for Arabic.