Closed Maldoror1900 closed 6 months ago
Oh looks like an issue with the snippet, can you try with this please?
!pip install --upgrade transformers optimum accelerate
!pip install --upgrade transformers>=4.36
!pip install --upgrade torch>=2.1.1
import torch
from transformers import pipeline
from transformers.utils import is_flash_attn_2_available
pipe = pipeline(
"automatic-speech-recognition",
model="openai/whisper-large-v3", # select checkpoint from https://huggingface.co/openai/whisper-large-v3#model-details
torch_dtype=torch.float16,
device="cpu", # or mps for Mac devices
model_kwargs={"use_flash_attention_2": is_flash_attn_2_available()},
)
outputs = pipe(
"/content/drive/MyDrive/aurore.wav",
chunk_length_s=30,
batch_size=24,
return_timestamps=True,
)
outputs
That code doesnt work for me. I get this error: RuntimeError: Input type (float) and bias type (c10::Half) should be the same
to fix the code I changed from float16 to float32 and that solved the issue.
# !pip install --upgrade transformers optimum accelerate
# !pip install --upgrade transformers>=4.36
# !pip install --upgrade torch>=2.1.1
import torch
from transformers import pipeline
from transformers.utils import is_flash_attn_2_available
AUDIO_FILE_PATH = "/working/testing.mp3"
pipe = pipeline(
"automatic-speech-recognition",
model="distil-whisper/distil-small.en", # select checkpoint from https://huggingface.co/openai/whisper-large-v3#model-details
torch_dtype=torch.float32,
device="cpu", # or mps for Mac devices
model_kwargs={"use_flash_attention_2": is_flash_attn_2_available()},
)
outputs = pipe(
AUDIO_FILE_PATH,
chunk_length_s=30,
batch_size=24,
return_timestamps=True,
)
outputs
Using this code to run on CPU on Colab, but I systematically end up with this error :
ValueError: Transformers now supports natively BetterTransformer optimizations (torch.nn.functional.scaled_dot_product_attention) for the model type whisper. Please upgrade to transformers>=4.36 and torch>=2.1.1 to use it. Details: https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-and-memory-efficient-attention-through-pytorchs-scaleddotproductattention
Any idea on how to fix this ? I just want to see how fast can insanely-fast-whisper be on CPU...