Closed SergioEstevao closed 2 months ago
What kind of audio is it? Also could you provide the command you are using to call the cli? This may be a result of log prob errors considering the full windows to be silent, which would happen if the audio is particularly noisy. Can you try adjusting the log prop threshold and see if the results are better?
So I was trying to transcribe an episode from the Cautionary Tales podcast. The sound is clear for the majority of the episodes.
I was using the CLI with this command:
swift run whisperkit-cli transcribe --model-path "Models/whisperkit-coreml/openai_whisper-tiny" --audio-path ../transcripts/audio.mp3 --report
You can get the audio file from here:
https://chtbl.com/track/39E17/podtrac.com/pts/redirect.mp3/pdrl.fm/18db03/traffic.omny.fm/d/clips/e73c998e-6e60-432f-8610-ae210140c5b1/c0ae8c6e-22f0-4e9b-ac1c-ae390037ac53/a4efe84f-d748-4730-98f5-b1770137cb8e/audio.mp3
@ZachNagengast After doing some more tests I believe the bug is on the converting process when the source file is not 1 channel and 16Kz.
This line here
While we are reading new data for the input buffer in chunck we are always writing to the same position (0) of the outputBuffer so in the end the outputBuffer only has data from the last chunk read from the input file.
Hi @SergioEstevao I'm having trouble reproducing this, can you share your the hardware and OS you're using where this error occurs?
This is the file I get running your same command with the file last30bug.srt.zip
I've confirmed theres something weird with the output buffer. Will have something for this shortly.
When using the whisper-kit cli or apps with a large file, 50 minutes of audio, it looks like the final report (.srt file) is only showing the last 30s of content transcribed.
Is this expected, Am I'm missing a command line argument?