CoreML: Repeating parts of text instead of transcribing - more than an hour long files

russell-dot-js commented 7 months ago

See #612 - the error seems to be prevalent when using a CoreML model. Rebuilding without CoreML resolves the issue

lucidyan commented 7 months ago

I can confirm that with CoreML it freezes immediately with large files, but rebuilding without it does not completely solve the problem (it just occurs later with large files).

But this helped me: https://github.com/ggerganov/whisper.cpp/issues/896#issuecomment-1569586018

KNWR commented 4 months ago

I tried the above but still ran into some issues. As a workaround, I'm using a script to split large audio files into smaller chunks (here, 1200 seconds aka 20 minutes). This uses ffprobe and ffmpeg to split m4a files.

Usage:

chmod u+x split_audio.sh
./split_audio.sh <path to your m4a file>

After using whisper to transcribe the parts, I put the transcripts of all the parts in one file with cat: cat file2.txt file3.txt ... >> file1.txt

Script:

#!/bin/bash

# Function to get the duration of the audio file in seconds
get_audio_duration() {
    duration=$(ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 "$1")
    echo "$duration"
}

# Function to split the audio file into n-minute parts
split_audio() {
    file_path="$1"
    segment_duration=1200  # seconds
    duration=$(get_audio_duration "$file_path")
    file_name="${file_path%.*}"
    file_ext="${file_path##*.}"
    num_parts=$(echo "$duration / $segment_duration" | bc)
    if (( $(echo "$duration % $segment_duration > 0" | bc) )); then
        num_parts=$(($num_parts + 1))
    fi

    for ((i=0; i<num_parts; i++)); do
        start_time=$(echo "$i * $segment_duration" | bc)
        output_file="${file_name}_part$(($i + 1)).${file_ext}"
        ffmpeg -i "$file_path" -ss "$start_time" -t "$segment_duration" -c copy "$output_file"
    done
}

# Main script execution
if [[ $# -ne 1 ]]; then
    echo "Usage: $0 <path_to_m4a_file>"
    exit 1
fi

file_path="$1"

split_audio "$file_path"

ggerganov / whisper.cpp

CoreML: Repeating parts of text instead of transcribing - more than an hour long files #1851