Vaibhavs10 / insanely-fast-whisper

Apache License 2.0
7.5k stars 529 forks source link

How is the accuracy and memory usage as compared to Faster Whisper? #167

Open bakermanbrian opened 8 months ago

bakermanbrian commented 8 months ago

I am using Faster Whisper and the accuracy for Faster Whisper is supposed to be the same as the OpenAI model. Additionally, the memory usage is much lower than the OpenAI model. How does Insanely Fast Whisper compare on both of those fronts?

8090s commented 8 months ago

same question

stri8ed commented 6 months ago

Anecdotally, I would say the accuracy is worse. The huggingface implementation, which this uses, applies a stride to the chunks. Meaning, some of the input, is duplicated across multiple chunks. They do this to add context, to compensate for the fact that the batched implementation does not get the context from previous chunk, since they are run in parallel.

The issue with this is, it often results in the same text being repeated multiple times, due to the overlapping chunks. In theory, this is corrected via some heuristics, but in practice, I have not found it to work well.

Also, in the huggingface implementation, they do not apply the hallucination checks to chunks shorter than 30 seconds. And in the case of batch inference, every chunk is 30 seconds or less.

Fundamentally, I don't see how a parallelized version of whisper, can achieve the same accuracy as the original serial one, since it lacks the context from previews chunks, which often helps resolve ambiguity.

https://huggingface.co/blog/asr-chunking

arasaahov commented 3 months ago

Haven't found anything for whisper hallucinations so far. And end up writing a simple postprocessing workaround for repeated chunks/hallucinations.

import json
import sys

def find_identical_rows(file_path, N):
    line_positions = {}

    # Open the file and read line by line
    with open(file_path, 'r') as file:
        for line_number, line in enumerate(file, start=1):
            # Remove any leading/trailing whitespace characters
            line = line.strip()
            if not line:  # Skip empty rows
                # print("EMpty line!")
                continue
            if line in line_positions:
                # print("line is already in the dictionary:", line)
                line_positions[line].append(line_number)
            else:
                # print("line is not in the dictionary:", line)
                line_positions[line] = [line_number]

    # print(json.dumps(line_positions, sort_keys=True, indent=4))

    # Identify lines that occur more than once and check the distance
    identical_rows = []
    for line, positions in line_positions.items():
        if len(positions) > 1:
            # print(line, positions)
            for i in range(len(positions) - 1):
                if positions[i + 1] - positions[i] <= N:
                    identical_rows.append((line, positions))
                    # print("Identical", i+1, positions[i + 1], i, positions[i])
                    break

    # Print lines that meet the criteria
    if identical_rows:
        # print(file_path, "Identical rows with required distance:")
        print(file_path)
        for row, positions in identical_rows:
            print(f"(Positions: {positions}) {row}")
    #else:
        #print("No identical rows found with the required distance.")

input_file = sys.argv[1]  # Get the input file path from command-line arguments

find_identical_rows(input_file, 12)
identical-rows.py 01-large-v2.srt
01-large-v2.srt
(Positions: [91, 99, 383, 1459, 4087, 4887]) Yeah.
(Positions: [363, 375]) Hello.
(Positions: [1895, 1899, 1903, 1907, 1911, 1915, 1919, 1923, 1927, 1931, 1935, 1939, 1943, 1947, 1951, 1955, 1959, 1963, 1967, 2899, 2903, 2907, 2911, 2915, 2919, 2923, 2927, 2931, 2935, 2939, 2943, 2947, 2951, 2955, 2959, 2963, 2967, 2971, 3599, 3603, 3607, 3611, 3615, 3619, 3623, 3627, 3631, 3635, 3639, 3643, 3647, 3651, 3655, 3659, 3663, 3667, 3671, 3763, 3767, 3771, 3775, 3779, 3783, 3787, 3791, 3795, 3799, 3803, 3807, 3811, 3815, 3819, 3823, 3827, 3831, 4039, 4307, 4311, 4315, 4319, 4323, 4327, 4331, 4335, 4339, 4343, 4347, 4351, 4355, 4359, 4363, 4367, 4371, 4375, 4379, 4399, 4403, 4407, 4411, 4415, 4419, 4423, 4427, 4431, 4435, 4439, 4443, 4447, 4451, 4455, 4459, 4463, 4467, 4471, 4479, 4483, 4487, 4491, 4495, 4499, 4503, 4507, 4511, 4515, 4519, 4523, 4527, 4531, 4535, 4539, 4543, 4547, 4551, 5159]) Okay.
(Positions: [3507, 3583, 3587, 4571, 4575, 4579]) Silence.