Trying to run insanely fast whisper on CPU -

Maldoror1900 commented 6 months ago

Using this code to run on CPU on Colab, but I systematically end up with this error :

!pip install --upgrade transformers optimum accelerate
!pip install --upgrade transformers>=4.36
!pip install --upgrade torch>=2.1.1

import torch
from transformers import pipeline
from transformers.utils import is_flash_attn_2_available

pipe = pipeline(
    "automatic-speech-recognition",
    model="openai/whisper-large-v3", # select checkpoint from https://huggingface.co/openai/whisper-large-v3#model-details
    torch_dtype=torch.float16,
    device="cpu", # or mps for Mac devices
    model_kwargs={"use_flash_attention_2": is_flash_attn_2_available()},
)

if not is_flash_attn_2_available():
    # enable flash attention through pytorch sdpa
    pipe.model = pipe.model.to_bettertransformer()

outputs = pipe(
    "/content/drive/MyDrive/aurore.wav",
    chunk_length_s=30,
    batch_size=24,
    return_timestamps=True,
)

outputs

ValueError: Transformers now supports natively BetterTransformer optimizations (torch.nn.functional.scaled_dot_product_attention) for the model type whisper. Please upgrade to transformers>=4.36 and torch>=2.1.1 to use it. Details: https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-and-memory-efficient-attention-through-pytorchs-scaleddotproductattention

Any idea on how to fix this ? I just want to see how fast can insanely-fast-whisper be on CPU...

Vaibhavs10 commented 6 months ago

Oh looks like an issue with the snippet, can you try with this please?


!pip install --upgrade transformers optimum accelerate
!pip install --upgrade transformers>=4.36
!pip install --upgrade torch>=2.1.1

import torch
from transformers import pipeline
from transformers.utils import is_flash_attn_2_available

pipe = pipeline(
    "automatic-speech-recognition",
    model="openai/whisper-large-v3", # select checkpoint from https://huggingface.co/openai/whisper-large-v3#model-details
    torch_dtype=torch.float16,
    device="cpu", # or mps for Mac devices
    model_kwargs={"use_flash_attention_2": is_flash_attn_2_available()},
)

outputs = pipe(
    "/content/drive/MyDrive/aurore.wav",
    chunk_length_s=30,
    batch_size=24,
    return_timestamps=True,
)

outputs

gangula-karthik commented 3 months ago

That code doesnt work for me. I get this error: RuntimeError: Input type (float) and bias type (c10::Half) should be the same

to fix the code I changed from float16 to float32 and that solved the issue.

# !pip install --upgrade transformers optimum accelerate
# !pip install --upgrade transformers>=4.36
# !pip install --upgrade torch>=2.1.1

import torch
from transformers import pipeline
from transformers.utils import is_flash_attn_2_available

AUDIO_FILE_PATH = "/working/testing.mp3"

pipe = pipeline(
    "automatic-speech-recognition",
    model="distil-whisper/distil-small.en", # select checkpoint from https://huggingface.co/openai/whisper-large-v3#model-details
    torch_dtype=torch.float32,
    device="cpu", # or mps for Mac devices
    model_kwargs={"use_flash_attention_2": is_flash_attn_2_available()},
)

outputs = pipe(
    AUDIO_FILE_PATH,
    chunk_length_s=30,
    batch_size=24,
    return_timestamps=True,
)

outputs

Vaibhavs10 / insanely-fast-whisper

Trying to run insanely fast whisper on CPU - #153