argmaxinc / WhisperKit

On-device Speech Recognition for Apple Silicon
https://takeargmax.com/blog/whisperkit
MIT License
3.2k stars 272 forks source link

Only translating last 30s or so of the audio file. #172

Closed SergioEstevao closed 2 months ago

SergioEstevao commented 3 months ago

When using the whisper-kit cli or apps with a large file, 50 minutes of audio, it looks like the final report (.srt file) is only showing the last 30s of content transcribed.

Is this expected, Am I'm missing a command line argument?

ZachNagengast commented 3 months ago

What kind of audio is it? Also could you provide the command you are using to call the cli? This may be a result of log prob errors considering the full windows to be silent, which would happen if the audio is particularly noisy. Can you try adjusting the log prop threshold and see if the results are better?

SergioEstevao commented 3 months ago

So I was trying to transcribe an episode from the Cautionary Tales podcast. The sound is clear for the majority of the episodes.

I was using the CLI with this command: swift run whisperkit-cli transcribe --model-path "Models/whisperkit-coreml/openai_whisper-tiny" --audio-path ../transcripts/audio.mp3 --report

You can get the audio file from here:

https://chtbl.com/track/39E17/podtrac.com/pts/redirect.mp3/pdrl.fm/18db03/traffic.omny.fm/d/clips/e73c998e-6e60-432f-8610-ae210140c5b1/c0ae8c6e-22f0-4e9b-ac1c-ae390037ac53/a4efe84f-d748-4730-98f5-b1770137cb8e/audio.mp3
SergioEstevao commented 3 months ago

@ZachNagengast After doing some more tests I believe the bug is on the converting process when the source file is not 1 channel and 16Kz.

This line here

While we are reading new data for the input buffer in chunck we are always writing to the same position (0) of the outputBuffer so in the end the outputBuffer only has data from the last chunk read from the input file.

ZachNagengast commented 3 months ago

Hi @SergioEstevao I'm having trouble reproducing this, can you share your the hardware and OS you're using where this error occurs?

This is the file I get running your same command with the file last30bug.srt.zip

ZachNagengast commented 3 months ago

I've confirmed theres something weird with the output buffer. Will have something for this shortly.