SubtitleEdit / subtitleedit

the subtitle editor :)
http://www.nikse.dk/SubtitleEdit/Help
GNU General Public License v3.0
8.9k stars 915 forks source link

Subtitle Edit 4.0.9 and Purfview's Faster Whisper (XXL) Still Broken Merging Subtitle Lines and Also Need Translate to English option with large-v3-turbo #9035

Open TranslateFuture opened 1 day ago

TranslateFuture commented 1 day ago

Hello everyone, hope everything is going well.

With the newest Subtitle Edit 4.0.9 beta update (and every release after the final version of Subtitle Edit 4.0.3, or after December 23, 2023) there's still the broken/merged/random capitalization/etc. issues that I described here: https://github.com/SubtitleEdit/subtitleedit/issues/8634 and https://github.com/SubtitleEdit/subtitleedit/issues/8209

About 4 months ago, these were some examples and settings of the issue in Korean, Chinese, Spanish, and English: https://github.com/SubtitleEdit/subtitleedit/issues/8634#issuecomment-2228885837

github subtitle edit new broken translations for korean chinese spanish etc.zip

Purfview said that the broken merged lines are due to the --standard command, thank you so much for all the work and quick fixes.

It's definitely the addition of more breaks or
to the processing, due to the various random capitalizations or hastened and/or delayed dialogue lines. Sometimes the lines will also not be capitalized at all, but then the previous/following lines will have a capital letter for essentially every word.

And now that the --beep_off option is made default in Subtitle Edit, is it possible to have the option to revert to the previous default (before Subtitle Edit 4.0.4) without the --standard/etc. command every time the program is launched. I also sometimes like the beep noise when the translation is done, so it'd be really great to have a toggle or the option to keep it enabled instead of being always disabled.

Because when going to the Advanced section of the "Audio to text" menu, those --standard, --sentence, --max_line_width, etc. commands are not saved automatically and you have to always retype them in the command line box for each translation.

Also, not sure if it's due to also the recent updates, but sometimes the batch translation with Purfview's Faster Whisper (XXL) doesn't work at all, like sometimes it'd output a blank .srt or 0KB file. Before (around Subtitle Edit 4.0.3/4.0.4) it'd be outputting just fine or in some cases you'd also need to run the files one by one instead of using the batch translation option, just like right now with Subtitle Edit 4.0.8/4.0.9.

Oh and Faster-Whisper-XXL r194.2 was released 2 days ago (r193/r194 added diarization!) and seemed to fix other issues (the main issue of broken merging lines/random capitalizations still happens with Faster-Whisper-XXL r194.2 and Subtitle Edit 4.0.9). The regular or original Purfview's Faster-Whisper is not getting updates anymore, so it'd be cool to automatically integrate Faster-Whisper-XXL even more with Subtitle Edit as right now you still have to manually drag the files to the folder when updating or if you want to use it.

The issues or problems are mainly with the regular large-v3 model processing (both with the regular and XXL version), since sometimes it will also automatically fail or won't translate at all after a few seconds. And so you have to change the file format or use a different version of Subtitle Edit and other workarounds like removing the audio from the video first to reprocess it again (this works 95% of the time, turning the video into audio first or a different format/container/etc.).

And yes, the large-v3-turbo option is still unavailable for Purfview's Faster Whisper (XXL) through Subtitle Edit 4.0.9, if you rename the folder right now or put the large-v3-turbo files inside the faster-whisper-large-v3 folder, it will work but without the ability to Translate to English, as in it'll only transcribe in its original/target language and won't be converted to English after the processing.

The turbo model is understandably not as good as the previous large-v3 and large-v2 models, but for audio/video run times that go over 1 hour or so, it will help those that don't want to wait/split the files/etc.

I apologize once again if the problem doesn't happen with other systems but with my current PC setup it's been like this since last year or Subtitle Edit 4.0.3. Please check the files and screenshot from the previous comments I made, as the differences are immediate there if you pair them with the videos I linked above.

Once more, I cannot stress how much I've wished for something like the current Subtitle Edit/Faster-Whisper/etc. combo right now, it's literally one of the most game-changing programs ever. Thank you so much for always updating quickly and helping everyone.

Purfview commented 1 day ago

it'd be cool to automatically integrate Faster-Whisper-XXL even more with Subtitle Edit as right now you still have to manually drag the files to the folder when updating or if you want to use it.

Faster-Whisper-XXL is packed with 7z, I think SE would need to include https://7-zip.org/a/7zr.exe to extract it. Zip is almost 3Gb, GitHub allows max 2Gb file, so I can't offer a zip download for it.

if you rename the folder right now or put the large-v3-turbo files inside the faster-whisper-large-v3 folder, it will work but without the ability to Translate to English.

Turbo is not meant to do translations.

Purfview commented 1 hour ago

Another solution is to provide sfx archive.