SubtitleEdit / subtitleedit

the subtitle editor :)
http://www.nikse.dk/SubtitleEdit/Help
GNU General Public License v3.0
8.69k stars 908 forks source link

3.5.9 Bug and very slow performance #3431

Closed wtester7 closed 5 years ago

wtester7 commented 5 years ago

Hello,

I have discovered a nasty BUG in the portable Subtitle Edit 3.5.10 Version and I am really disappointed with Tesseract 4, it's just too slow!!! I'm using a first generation Quad Core 2,67 Ghz and Win7 x64.

I have done various benchmarks with a BluRay 1117 lines PGS and the OCR with Tesseract 4:

Subtitle 3.5.10 portable:

Tesseract 3.02 ( 4 errors ) = 07 Minutes : 20 Seconds Tesseract 4 , Engine Mode "Default" ( 4 errors ) = 15 Minutes : 35 Seconds Tesseract 4 , Engine Mode "Original Tesseract only" ( 14 errors ) = 12 Minutes : 32 Seconds Tesseract 4 , Engine Mode "LSTM only" ( 2 errors ) = 15 Minutes : 22 Seconds Tesseract 4 , Engine Mode "Tesseract + LSTM" ( 5 errors ) = 16 Minutes : 46 Seconds

Subtitle 3.5.6 portable: Tesseract ( 5 errors ) = 01 Minute : 50 Seconds !!!


Bug in Subtitle 3.5.10 portable:

Bug: Prompt for unknown Words Popup - Change all Button - replaces original eng_OCRFixReplaceList.xml ( 136 KB ) into a eng_OCRFixReplaceList.xml ( 2 KB ) - everything from the original eng_OCRFixReplaceList.xml is lost!

WORKS: Prompt for unknown Words Popup - Add to names/noise list Button - adding words to en_names.xml. WORKS: Prompt for unknown Words Popup - Add to user dictionary Button - adding words to en_US_user.xml. WORKS: Prompt for unknown Words Popup - USE ALWAYS - adds words to eng_OCRFixReplaceList_User.xml

Bugs in Subtitle 3.5.6 portable:

Bug: Prompt for unknown Words Popup - Change all Button - replaces original eng_OCRFixReplaceList.xml ( 136 KB ) into a eng_OCRFixReplaceList.xml ( 2 KB ) - everything from the original eng_OCRFixReplaceList.xml is lost! Bug: Prompt for unknown Words Popup - Add to names/noise list Button - doesnt add words from the unknown Words Popup into en_names.xml, but from the unknown words table list it adds into en_names.xml Bug: Prompt for unknown Words Popup - Add to user dictionary Button - doesnt add words from the unknown Words Popup into en_US_user.xml, but from the unknown words table list it adds into en_US_user.xml

WORKS: Prompt for unknown Words Popup - USE ALWAYS - adds words to eng_OCRFixReplaceList_User.xml


As you can see the Tesseract in Subtitle 3.5.6 portable is the fastest, OCR takes only 01 Minute : 50 Seconds with 5 errors!!! What is going on with Tesseract 4??? It's 8x times slower with more errors!!!

It would be really great if you can provide a new version with the Tesseract from 3.5.6 with all mentioned bugs fixed. Well most mentioned bugs are fixed in version 3.5.10, the only remaining bug is the:

Prompt for unknown Words Popup - Change all Button - replaces original eng_OCRFixReplaceList.xml ( 136 KB ) into a eng_OCRFixReplaceList.xml ( 2 KB ) - everything from the original eng_OCRFixReplaceList.xml is lost!

I would really appreciate it, thanks!

OmrSi commented 5 years ago

That's why he returned to Tesseract 3 and added the possibility to download 4. He tested and found that 4 is too slow, so it's not really a bug, it's just is...

wtester7 commented 5 years ago

Hello OmrSi, please read the whole text again.

The Tesseract ( I don't know which version it is ) of the portable version of Subtitle Edit 3.5.6 is still much faster ( "01 Minute : 50 Seconds with 5 errors" ) than Tesseract 3.02 of Version 3.5.10 ( "07 Minutes : 20 Seconds with 4 errors" ).

Also this bug of version 3.5.10 still remains:

Prompt for unknown Words Popup - Change all Button - replaces original eng_OCRFixReplaceList.xml ( 136 KB ) into a eng_OCRFixReplaceList.xml ( 2 KB ) - everything from the original eng_OCRFixReplaceList.xml is lost!


So my request is still open, thanks!:

It would be really great if you can provide a new version with the Tesseract from 3.5.6 with all mentioned bugs fixed. Well most mentioned bugs are fixed in version 3.5.10, the only remaining bug is the:

Prompt for unknown Words Popup - Change all Button - replaces original eng_OCRFixReplaceList.xml ( 136 KB ) into a eng_OCRFixReplaceList.xml ( 2 KB ) - everything from the original eng_OCRFixReplaceList.xml is lost!

I would really appreciate it, thanks!

niksedk commented 5 years ago

@wtester7: I cannot re-create the bug about overwriting eng_ocr eng_OCRFixReplaceList.xm... just downloaded the beta and used "change all" - it just created the user file. Could you attach the sup you're ocr'ing?

Ding-adong commented 5 years ago

@wtester7 **Bug**: Prompt for unknown Words Popup - Change all Button - replaces original eng_OCRFixReplaceList.xml ( 136 KB ) into a eng_OCRFixReplaceList.xml ( 2 KB ) - everything from the original eng_OCRFixReplaceList.xml is lost!

This happened to me now and then for 5 years and couldn't understand why? I tried to replicate the bug but failed. See #3280 Does it always happen when you click 'change all' button or randomly now and then?

I suggest opening the eng_OCRFixReplaceList.xml in notepad++ when you ocr a file. If notepad++ dialog box appear saying, "file change blah blah; reload from disk" click on no/cancel then save to restore your original. Also make a backup file and put it in C:\Users****\AppData\Roaming\Subtitle Edit just in case.

wtester7 commented 5 years ago

Yes this bug always happens when pressing "Change All" in the Prompt for unknown Words Popup, in both Subtitle 3.5.6 portable & Subtitle 3.5.10 portable version on Win 7 x64.

Here is the PGS in mks from mkvtoolnix:

00274.zip

niksedk commented 5 years ago

Yes this bug always happens when pressing... well, it does not happen here - could you download the latest beta to an empty folder and write your exact steps? As @Ding-adong also had the issue I guess the bug must be hidden somewhere, but if I cannot re-create it I cannot fix it :(

Thx for the file :)

Ding-adong commented 5 years ago

@wtester7 Can you check the new eng_OCRFixReplaceList.xml and compare it with eng_OCRFixReplaceList_User.xml and see if it is the same. Sometimes it would ask to reload eng_OCRFixReplaceList and the data all gone and replaced with a exact copy of eng_OCRFixReplaceList_User.

wtester7 commented 5 years ago

I suggest opening the eng_OCRFixReplaceList.xml in notepad++ when you ocr a file. If notepad++ dialog box appear saying, "file change blah blah; reload from disk" click on no/cancel then save to restore your original. Also make a backup file and put it in C:\Users****\AppData\Roaming\Subtitle Edit just in case.

Thanks for the notepad++ trick, I will try it. I always have a backup for eng_OCRFixReplaceList.xml and always replaced it with the small overwritten one. Or just didn't used Change All at all...

wtester7 commented 5 years ago

@wtester7 Can you check the new eng_OCRFixReplaceList.xml and compare it with eng_OCRFixReplaceList_User.xml and see if it is the same. Sometimes it would ask to reload eng_OCRFixReplaceList and the data all gone and replaced with a exact copy of eng_OCRFixReplaceList_User.

Yep this exactly happens, it just adds the content from eng_OCRFixReplaceList_User.xml to eng_OCRFixReplaceList.xml

wtester7 commented 5 years ago

Here it is again! Downloaded the Beta from: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.9/SubtitleEditBeta.zip

Extracted it to desktop, started Subtitle Edit: Dragged the mks file to Subtitle Edit. Selected OCR Method Tesseract 3.02, Language = English, Dictionary = English en_US. Prompt for unknown words = checked Started OCR, first line = first Popup for Toho. Click on edit whole Text. After LTD just add an extra . after LTD. so that the change all button appears. Click on it and the original 136 KB eng_OCRFixReplaceList.xml is only 1KB with following content:

<ReplaceList>
  <WholeWords />
  <PartialLines />
  <BeginLines />
  <EndLines />
  <WholeLines>
    <Line from="TOHO CO., LTD." to="TOHO CO., LTD.." />
  </WholeLines>
  <RemovedWholeWords />
  <RemovedPartialLines />
  <RemovedBeginLines />
  <RemovedEndLines />
  <RemovedWholeLines />
</ReplaceList> 
Ding-adong commented 5 years ago

@niksedk Ok, I got the bug again.

1- When spell check box pops up and I modified a word directly into the 'word not found' box and click 'change all' - nothing happens.

2- When I clicked on 'edit whole text', as @wtester7 did, modified a word then click on 'change all' - nasty bug back again. I check notepad++ and it asked me to reload, obviously I refused. Went into dictionaries directory and renamed eng_OCRFixReplaceList and opened the file. It is now a carbon copy of updated eng_OCRFixReplaceList_User. The bug is updating both eng_OCRFixReplaceList and eng_OCRFixReplaceList_User.

I rarely use 'edit whole text' hence the bug rarely happened. I use option 1 above almost all the time.

niksedk commented 5 years ago

Thx :)

Beta updated: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.9/SubtitleEditBeta.zip

About this mks file... it contains many unknown names and that makes SE do a lot of retries in Tesseract using different settings without much luck. For this file I would recommend using the "Binary image compare" or add the names to the names list.

wtester7 commented 5 years ago

You know I had the feeling about the mks file, that it somehow is not fully compatible with Subtitle Edit. I will try to extract the .sup file with tsMuxer and try again if the bug still appears!

wtester7 commented 5 years ago

Thx :)

Beta updated: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.9/SubtitleEditBeta.zip

About this mks file... it contains many unknown names and that makes SE do a lot of retries in Tesseract using different settings without much luck. For this file I would recommend using the "Binary image compare" or add the names to the names list.

Thanks nik! I will also try your new beta and report back!

Ding-adong commented 5 years ago

@niksedk Is that all? Fixed by changing 1 line. Been like this for 5 years and I was expecting a complicated bug. Well hopefully it works and one mysterious bug solved.

@wtester7

Tesseract 4 , Engine Mode "Default" ( 4 errors ) = 15 Minutes : 35 Seconds
Tesseract 4 , Engine Mode "Original Tesseract only" ( 14 errors ) = 12 Minutes : 32 Seconds
Tesseract 4 , Engine Mode "LSTM only" ( 2 errors ) = 15 Minutes : 22 Seconds
Tesseract 4 , Engine Mode "Tesseract + LSTM" ( 5 errors ) = 16 Minutes : 46 Seconds

Uncheck all boxes on the right side below dictionary. Tesseract 4 , Engine Mode "Tesseract + LSTM" (produce the least erros) then to speed through OCRing. Then use fix common errors in normal SE mode. It is much much quicker this way. No more than 3 minutes in total for 1 hour programme with 500 ish lines.

Image palette, best to use white background and black chars that reads easy.

wtester7 commented 5 years ago

Ding-adong, thx for the tips but I won't do that, I like these options checked because:

The Tesseract ( I don't know which version it is ) of the portable version of Subtitle Edit 3.5.6 is still much faster ( "01 Minute : 50 Seconds with 5 errors" ) than Tesseract 3.02 of Version 3.5.10 ( "07 Minutes : 20 Seconds with 4 errors" ).

In both versions, all options are checked: Fix OCR errors, Prompt for unknown words, try to guess unknown words and Auto break paragraph

wtester7 commented 5 years ago

nik, could you please provide a version with the same Tesseract from 3.5.6 to your new beta? It's the fastest and best as you can see through the benchmarks...

Ding-adong commented 5 years ago

If you still have 3.5.6 you can go into the directory and look at the tess file properties to see what version it is. It could be pre 3.02.

wtester7 commented 5 years ago

Nik, and @ALL ,

the mks file wasn't at fault, the same thing happened with the original .sup file from tsMuxer. The same bug happened and the eng_OCRFixReplaceList.xml was overwritten.

But now with the new fixed https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.9/SubtitleEditBeta.zip

Both mks & sup file are working, so it wasn't their fault. Now the original 135 KB eng_OCRFixReplaceList.xml still remains and a new eng_OCRFixReplaceList_User.xml with the "Change All" changes are correctly written!

Good job Nik!

Ding-adong commented 5 years ago

Thanks for coming along @wtester7. You solved a 5 year riddle ha ha.

Ding-adong commented 5 years ago

The tess files are the same now and 3.5.6. Only the OCR directory, latin.db and latin.nocr have changed.

wtester7 commented 5 years ago

If you still have 3.5.6 you can go into the directory and look at the tess file properties to see what version it is. It could be pre 3.02.

Tesseract from 3.5.6 = 10,7 MB Tesseract 3.02 from Beta = 6,49 MB

Both Tesseract.exe are bit for bit the exact same size, 2,24 MB and both are Version 3.02 in the exe file properties...

Only in 3.5.6 in \Tesseract\tessdata\deu.traineddata" \deu-frak.traineddata"Tesseract"

these files are not in the Beta. But deu is for german, I doubt this has something to do with ocr speed performance. Maybe if you ocr german subtitles...

But still, OCR in Version 3.5.6 = 01 Minute : 50 Seconds and OCR in Beta = 05 Minutes : 24 Seconds...

Some changes in code happened and it's all compiled into the SubtitleEdit.exe, thus the slower performance, I thought it's the different Tesseract Versions... ???

Ding-adong commented 5 years ago

Tested 3.5.6 and timing were the same, just a few more errors compare to 3.5.9. Tess 4.0 used all my 12 CPU cores, between 90 and 100% and I wasn't doing anything else. Perhaps you were running other programs that was using the CPU, hence why your times are much longer.

wtester7 commented 5 years ago

Nope, no background programs running, it's the same environment for both tests...

Is my Quad Core i7 2,67 Ghz first Generation the fault that newer version than 3.5.6 the OCR is much slower? Because something has changed since 3.5.6 and I thought its Tesseract's fault!

Ding-adong commented 5 years ago

Well i can't think of anything else. Over to @niksedk .

Tess 4 is great, I do not know why many people are giving it the thumbs down.

wtester7 commented 5 years ago

I hope Nik can give us the answer... thanks! I just need the speed of 3.5.6 with all fixes in the Beta! Then I am happy :)

I suspect that it has to do with old CPU's although they are Quad Core, mine is now 10 years old. So I guess since version 3.5.6 something has changed for old CPU's and they can't handle the OCR processing well, thus it's much slower OCR performance. I am not the only one, many people have complained and they stick to version 3.5.6...

Ding-adong commented 5 years ago

I DL your 00274 mks file and it took 4 minutes to reach 500 lines whereas I used Vob file of 500 lines and it took 1 minute 10 seconds. Different file format different speed.

wtester7 commented 5 years ago

I DL your 00274 mks file and it took 4 minutes to reach 500 lines whereas I used Vob file of 500 lines and it took 1 minute 10 seconds. Different file format different speed.

Ding-adong, the same 00274.mks file was used for both versions and still: OCR in Version 3.5.6 = 01 Minute : 50 Seconds and OCR in Beta = 05 Minutes : 24 Seconds...

All settings the same, and the same subtitle file was used, only in the Beta Tesseract 3.02 was selected. In 3.5.6 you can't select any other Tesseract version...

So... :)

wtester7 commented 5 years ago

Try download 3.5.6 , use the same settings, the same subtitle file I provided and tell me your speed results: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.6/SE356.zip

It's really interesting because you wrote you had a 6-Core , so a relative new CPU... P.S for this benchmarks turn off prompt for unknown words so that the pop up never appears! I have measured it with a stop watch program ;)

Ding-adong commented 5 years ago

I have tested on the 3.5.6 with all options unchecked. I saw no difference. The only difference is mks vs vob. Duodec/dozen cores does help when using tess 4.

wtester7 commented 5 years ago

Wow, that's weird... so it really is people's old CPU's that can't process the OCR well in version 3.5.7. - 3.5.10 Beta...

@niksedk what has changed since version 3.5.6 besides Tesseract 4? It would be great to have the same performance as in 3.5.6 with all bug fixes included. Because for my 10 year old Quad Core i7 2,67 Ghz -> OCR in Version 3.5.6 = 01 Minute : 50 Seconds and OCR in Beta = 05 Minutes : 24 Seconds...

All settings the same, and the same subtitle file was used, only in the Beta Tesseract 3.02 was selected.

niksedk commented 5 years ago

I've an old Intel Core i7-3770 (Quad core 3.40 Ghz) from 2012 which gave these results:

Do try the binary ocr (it requires a bit more review/fix/inspect but I do think the result is better)

I think that SE 3.5.9 re-tries more times than 3.5.6 when it encounters an unrecognized word.

I've an new laptop at work... here OCR is crazy fast, so I guess Tesseract really likes many fast cores.

Ding-adong commented 5 years ago

@niksedk I think that SE 3.5.9 re-tries more times than 3.5.6 when it encounters an unrecognized word. That would also mean speeding through OCR with all options unchecked is faster since there would be no retries. I also assume that in normal SE, fix common errors don't have any 'retries'?

niksedk commented 5 years ago

Yes, all options unchecked will be faster - especially with a file with many Japanese names. No, fix common errors do not have any retries (of OCR'ing images).

wtester7 commented 5 years ago

I've tried the binary image compare in the Beta, it's really fast ( took 55 Seconds to finish ) but the result is very poor...There needs to be about 100 lines to be fixed which is not usable...

I will stick to the Beta with Tesseract 3.02, which is somewhat of an OK performance... @niksedk , is it much work for you to do to switch the Tesseract from 3.5.6 to the Beta, so I can get the same speed like in 3.5.6? Although I don't know if Tesseract is at fault for the performance slow down, it would be still great to test!

For all test I have chosen: Fix OCR erros, Try to guess unknown words & Auto break paragraph. Only Prompt for unknown words unchecked for benchmark purposes...

niksedk commented 5 years ago

You have to tweak the settings for binary image compare. Use 11 for "number of pixels is space" - if a words is wrong you can double click on the text in the list view to inspect+fix the wrong letter. I know it's a bit harder to learn, but it's possible to make it work nicely.

It's the same Tesseract in 3.5.6 and "Tesseract 3.02" in latest versions...

wtester7 commented 5 years ago

It's the same Tesseract in 3.5.6 and "Tesseract 3.02" in latest versions...

@niksedk OK, something is weird because I thought that the Tesseract from 3.5.6 is the same as in the Beta because of the same filesize ( bit for bit , it even states in the Tesseract.exe Version 3.02 for 3.5.6 as the same as in the Beta ).

If it's not Tesseract's fault what is causing the performance loss? Because I have tried now in 3.5.6 and in the latest Beta following options in OCR with exact same mks file: All options disabled = unchecked Fix OCR errors, unchecked Prompt for unknown words, unchecked Try to guess for unknown words, unchecked Auto break paragraph if more than two lines

Only Italic and Music Symbols for both versions activated

In 3.5.6 - the mks file is in 2 Minutes : 20 Seconds finished. In Beta - the mks file is in 4 Minutes : 05 Seconds finished.

If the Tesseract in both 3.5.6 and Beta are the same what else is causing the huge performance slow down in the OCR??? Something else changed in your software from 3.5.6 to the Beta that has caused this slow down...

Btw thank you and @Ding-adong for your hard work :)

wtester7 commented 5 years ago

Does it have something to do with this issue here? https://github.com/tesseract-ocr/tesseract/issues/2205

Ding-adong commented 5 years ago

Two more things I did.

  1. Disabled Regex - Slighty faster.
  2. Observe CPU and memory usage. Pre OCR 25% of 16gb of memory used = 4gb. During tess 4 ocr speeding through - 33% of memory used 5.28gb. An extra 1.28gb used. Perhaps not having enough memory could slow things down, as well as cores.
Ding-adong commented 5 years ago
In 3.5.6 - the mks file is in 2 Minutes : 20 Seconds finished.
In Beta - the mks file is in 4 Minutes : 05 Seconds finished.

@wtester7 Another possibility, @niksedk needs to confirm, the ocr filesize is bigger now compare to 3.5.6 portable and maybe a factor why beta is slower.

Lastly, maybe your 10 year old quad core is 'dying'.

wtester7 commented 5 years ago
In 3.5.6 - the mks file is in 2 Minutes : 20 Seconds finished.
In Beta - the mks file is in 4 Minutes : 05 Seconds finished.

@wtester7 Another possibility, @niksedk needs to confirm, the ocr filesize is bigger now compare to 3.5.6 portable and maybe a factor why beta is slower.

Lastly, maybe your 10 year old quad core is 'dying'.

I have tried to replace the Latin.db ( because of the different filesize compared to Beta ) from OCR folder of 3.5.6 and replaced it to the Beta and the performance is still the same, nothing has changed.

My CPU is in pretty good shape and still very healthy, so no it's not dying ;) I guess something in the compiled SubtitleEdit.exe since 3.5.6 has changed or how Tesseract was compiled for Windows i.e this issue: https://github.com/tesseract-ocr/tesseract/issues/2205

I hope @niksedk can solve this mystery ;) , because same options see https://github.com/SubtitleEdit/subtitleedit/issues/3431#issuecomment-468447964 ( all deactivated in OCR, only Italic and Music symbols activated ) used, same subtitle used but different speed, and nik told us that allegedly Tesseract 3.56 is the same as the Beta is using, so it's not normal. The slowdown is somewhere located in the software :)

wtester7 commented 5 years ago

Btw I did a comparison benchmark of the mks vs the org. sup file. Both speeds are exactly the same to finish. This means the mks file is not the problem, now I can't think of anything else...

wtester7 commented 5 years ago

@niksedk I found the culprit!

Copy the SubtitleEdit.exe from 3.5.6 and replace it in your Beta. Rename the folder "Tesseract302" to "Tesseract".

Drag the mks file from https://github.com/SubtitleEdit/subtitleedit/files/2916274/00274.zip to Subtitle Edit with following OCR options:

OCR is finished in 01 Minute : 49 Seconds - it is also 30 Seconds faster than disabling all options =
2 Minutes : 20 Seconds vs 4 Minutes : 05 Seconds with your original SubtitleEdit.exe from Beta!!!

So your SubtitleEdit.exe you've compiled is the problem, something has changed since 3.5.6 and that is the massive performance slowdown!!

Cheers =)

OmrSi commented 5 years ago

Well, he did say, I think that SE 3.5.9 re-tries more times than 3.5.6 when it encounters an unrecognized word.

wtester7 commented 5 years ago

Well, he did say, I think that SE 3.5.9 re-tries more times than 3.5.6 when it encounters an unrecognized word.

This can't be true because with all options checked ( minus disabled Prompt for unknown words ) it's faster than with all options disabled ( this means no guessing for unrecognized word ) & you can try these same settings ( all options checked & all options disabled ) with the SubtitleEdit.exe from 3.5.6 vs the original SubtitleEdit.exe from Beta!!

This comes to my conclusion that re-tries for unrecognized word is not true! Something else is the matter for this slow down. Nik should check the changes / change logs from 3.5.7 up to Beta and compare it with 3.5.6 what might possible be the problem...

Ding-adong commented 5 years ago

I just did the same experiment and suddenly with all option disabled, it was 10 second slower than enabling all, except for 'prompt unknown words'. Also number of fix needed was 228 vs 35. Surely if the software is fixing errors then it should take longer.

wtester7 commented 5 years ago

I just did the same experiment and suddenly with all option disabled, it was 10 second slower than enabling all, except for 'prompt unknown words'. Also number of fix needed was 228 vs 35. Surely if the software is fixing errors then it should take longer.

You see, this validates my statement that "re-tries for unrecognized word" is not true hence the false claim for the performance slow down. For me it's 30 Seconds faster with all options on ( except for prompt unknown words disabled )

Simple as that: SubtitleEdit.exe from 3.5.6 = 1 Minute : 49 Seconds SubtitleEdit.exe from Beta = 4 Minutes : 05 Seconds

That's nearly 2.5x slower, which is ALOT, atleast for me...

Like I wrote: @niksedk should check the changes / change logs from 3.5.7 up to Beta and compare it with 3.5.6 what might possible be the problem... and see my test here: https://github.com/SubtitleEdit/subtitleedit/issues/3431#issuecomment-468634096

Thanks!

Ding-adong commented 5 years ago

In order to pinpoint at which version it started to slow down, can you do the experiment with 3.5.7 https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.7/SE357.zip and 3.5.8 https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.8/SE358.zip

Ding-adong commented 5 years ago

@wtester7 Do you mind trying this file and see how long it would take you. https://github.com/Ding-adong/subtitleedit/blob/master/Vob.zip

wtester7 commented 5 years ago

@Ding-adong

I can't test it with 3.5.7 because 3.5.7 has Tesseract 4 folder and it's linked to the SubtitleEdit.exe. Tried to trick it with copying Tesseract 3.02 into the folder but I get errors in OCR.

In 3.5.8 Tesseract 3.02 is present and I did an OCR with all OCR options enabled ( minus prompt unknown words disabled ) it finished in 3 Minutes 32 Seconds. Still 30 Seconds faster than Beta but the slow down is already present.

So I suspect the performance slow down starts with the SubtitleEdit.exe from Version 3.5.7!