SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2
MIT License
11.6k stars 962 forks source link

Problem with audio #781

Open Herzfrequenz21 opened 5 months ago

Herzfrequenz21 commented 5 months ago

When I cut out only 5 minutes of the movie, everything worked correctly (screen #1). The one hour portion of the movie caused the error (screen #2).

Безымянный3 Безымянный1

General Unique ID : 80764340276249783376247501835629204513 (0x3CC2A569474575CF3E3F1C6E312F9421) Complete name : C:\Whisper-Faster\55776.mkv Format : Matroska Format version : Version 4 File size : 9.92 GiB Duration : 1 h 0 min Overall bit rate mode : Variable Overall bit rate : 23.7 Mb/s Encoded date : UTC 2024-04-05 09:25:18 Writing application : mkvmerge v82.0 ('I'm The President') 64-bit Writing library : libebml v1.4.5 + libmatroska v1.7.1

Video ID : 1 Format : AVC Format/Info : Advanced Video Codec Format profile : High@L4.1 Format settings : CABAC / 2 Ref Frames Format settings, CABAC : Yes Format settings, Reference fra : 2 frames Format settings, GOP : M=1, N=10 Codec ID : V_MPEG4/ISO/AVC Duration : 1 h 0 min Bit rate mode : Variable Bit rate : 19.9 Mb/s Width : 1 920 pixels Height : 1 080 pixels Display aspect ratio : 16:9 Frame rate mode : Constant Frame rate : 23.976 (23976/1000) FPS Original frame rate : 23.976 (24000/1001) FPS Color space : YUV Chroma subsampling : 4:2:0 Bit depth : 8 bits Scan type : Progressive Bits/(Pixel*Frame) : 0.400 Stream size : 8.34 GiB (84%) Default : Yes Forced : No

Audio #1 ID : 2 Format : DTS XLL Format/Info : Digital Theater Systems Commercial name : DTS-HD Master Audio Codec ID : A_DTS Duration : 1 h 0 min Bit rate mode : Variable Bit rate : 1 144 kb/s Channel(s) : 2 channels Channel layout : L R Sampling rate : 48.0 kHz Frame rate : 93.750 FPS (512 SPF) Bit depth : 24 bits Compression mode : Lossless Stream size : 491 MiB (5%) Default : Yes Forced : No

Audio #2 ID : 3 Format : DTS XLL Format/Info : Digital Theater Systems Commercial name : DTS-HD Master Audio Codec ID : A_DTS Duration : 1 h 0 min Bit rate mode : Variable Bit rate : 2 496 kb/s Channel(s) : 6 channels Channel layout : C L R Ls Rs LFE Sampling rate : 48.0 kHz Frame rate : 93.750 FPS (512 SPF) Bit depth : 24 bits Compression mode : Lossless Stream size : 1.05 GiB (11%) Default : Yes Forced : No

Text #1 ID : 4 Format : PGS Muxing mode : zlib Codec ID : S_HDMV/PGS Codec ID/Info : Picture based subtitle format used on BDs/HD-DVDs Duration : 59 min 30 s Bit rate : 209 kb/s Frame rate : 1.482 FPS Count of elements : 5291 Stream size : 89.0 MiB (1%) Default : Yes Forced : No

Text #2 ID : 5 Format : PGS Muxing mode : zlib Codec ID : S_HDMV/PGS Codec ID/Info : Picture based subtitle format used on BDs/HD-DVDs Duration : 59 min 30 s Bit rate : 35.3 kb/s Frame rate : 0.459 FPS Count of elements : 1638 Stream size : 15.0 MiB (0%) Language : Chinese Default : Yes Forced : No

Text #3 ID : 6 Format : PGS Muxing mode : zlib Codec ID : S_HDMV/PGS Codec ID/Info : Picture based subtitle format used on BDs/HD-DVDs Duration : 59 min 30 s Bit rate : 35.7 kb/s Frame rate : 0.459 FPS Count of elements : 1638 Stream size : 15.2 MiB (0%) Language : Chinese Default : Yes Forced : No

Purfview commented 5 months ago

First, you are posting in the wrong repo, your "exe" comes from ->https://github.com/Purfview/whisper-standalone-win Second, it's source, not "sourc".

The one hour portion of the movie caused the error

If cutting fixed it then try to remux the whole mkv with MKVToolNix or use the second audio track with --ff_track 2.

When I cut out only 5 minutes of the movie, everything worked correctly

It looks like transcription starts with a hallucination, at least the first line, then it looks like a music/song(?), for better transcriptions on movies/podcasts-with-music you would want to use Standalone Faster-Whisper-XXL and --ff_mdx_kim2 arg.

Herzfrequenz21 commented 5 months ago

If cutting fixed it then try to remux the whole mkv with MKVToolNix or use the second audio track with I cut everything else with MKVToolNix, leaving only the first and then the second audio track.

it looks like a music/song(?) No. But they talk really fast in this movie. The movie is in Japanese, and I put in Russian. Maybe that's why the text is so strange. I'll put up a Japanese one next time.

use Standalone Faster-Whisper-XXL and --ff_mdx_kim2 arg.

The full movie has audio errors again. And the 5-minute movie has the error in the screenshot below Безымянный9

Purfview commented 5 months ago

I cut everything else with MKVToolNix, leaving only the first and then the second audio track.

Did it solved your issue?

And the 5-minute movie has the error in the screenshot below

That is another issue. Open them in the right repo.

Herzfrequenz21 commented 5 months ago

Did it solved your issue?

I'm sorry, I forgot to write. No, it didn't. But then I cut a 30 minute episode and it worked. As I wrote above, the cut episode at 60 Minutes didn't work for me.

Purfview commented 5 months ago

https://github.com/Purfview/whisper-standalone-win/issues/236