MiningIrving commented 6 months ago

When I run the fellowing codes,the error ocurred. `import torch from transformers import pipeline from transformers.utils import is_flash_attn_2_available

pipe = pipeline( "automatic-speech-recognition", model="openai/whisper-tiny", # select checkpoint from https://huggingface.co/openai/whisper-large-v3#model-details torch_dtype=torch.float16, device="cuda:0", # or mps for Mac devices model_kwargs={"use_flash_attention_2": is_flash_attn_2_available()}, )

if not is_flash_attn_2_available():

enable flash attention through pytorch sdpa

pipe.model = pipe.model.to_bettertransformer()

outputs = pipe( "hello.mp3", chunk_length_s=30, batch_size=24, return_timestamps=True, )

outputs the error as the fellowing: Traceback (most recent call last): File "infer.py", line 15, in pipe.model = pipe.model.to_bettertransformer() File "/usr/local/lib/python3.8/dist-packages/transformers/modeling_utils.py", line 4314, in to_bettertransformer return BetterTransformer.transform(self) File "/usr/lib/python3.8/contextlib.py", line 75, in inner return func(*args, **kwds) File "/usr/local/lib/python3.8/dist-packages/optimum/bettertransformer/transformation.py", line 211, in transform raise ValueError( ValueError: Transformers now supports natively BetterTransformer optimizations (torch.nn.functional.scaled_dot_product_attention) for the model type whisper. Please upgrade to transformers>=4.36 and torch>=2.1.1 to use it. Details: https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-and-memory-efficient-attention-through-pytorchs-scaleddotproductattention` I use the fellowing enviroment: torch 2.1.2 transformers 4.36.2 I use the pip install to install insanely-fast-whisper I only want use python code ,not pipx

ramonsaraiva commented 6 months ago

Same here with transformers==4.36.2 and torch==2.3.0 with mps:

ValueError: Transformers now supports natively BetterTransformer optimizations (torch.nn.functional.scaled_dot_product_attention) for the model type whisper. Please upgrade to transformers>=4.36 and torch>=2.1.1 to use it. Details: https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-and-memory-efficient-attention-through-pytorchs-scaleddotproductattention

pip show outputs

pip show transformers torch

Name: transformers
Version: 4.36.2

Name: torch
Version: 2.3.0.dev20231227

Emveez commented 6 months ago

Having the same problem

meditans commented 6 months ago

Having the same problem

>>> torch.__version__
'2.1.2+cu121'
>>> transformers.__version__
'4.36.2'

meditans commented 6 months ago

Ok, I solved the problem actually reading the README:

⚠️ If you have python 3.11.XX installed, pipx may parse the version incorrectly and install a very old version of insanely-fast-whisper without telling you (version 0.0.8, which won't work anymore with the current BetterTransformers). In that case, you can install the latest version by passing --ignore-requires-python to pip

I had version 0.0.8, which I checked with pip show insanely-fast-whisper, and corrected the problem using the pip incantation in the readme (now I have 0.0.13 and everything works).

curiouscod3 commented 6 months ago

ValueError: Transformers now supports natively BetterTransformer optimizations (torch.nn.functional.scaled_dot_product_attention) for the model type whisper. Please upgrade to transformers>=4.36 and torch>=2.1.1 to use it. Details: https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-and-memory-efficient-attention-through-pytorchs-scaleddotproductattention

python3.11 I installed " transformers>=4.36 and torch>=2.1.1" but it keeps complaining like that :(

Mijawel commented 6 months ago

README doesn't help if you aren't using insanely-fast-whisper

wimiam1 commented 6 months ago

I am having the same problem. Confirmed I'm running Python 3.11 and insanely-fast-whisper 0.0.13

Vaibhavs10 commented 6 months ago

Hi All! Sorry for the delay in responding to you. I've updated the snippet in the README:

import torch
from transformers import pipeline
from transformers.utils import is_flash_attn_2_available

pipe = pipeline(
    "automatic-speech-recognition",
    model="openai/whisper-large-v3", # select checkpoint from https://huggingface.co/openai/whisper-large-v3#model-details
    torch_dtype=torch.float16,
    device="cuda:0", # or mps for Mac devices
    model_kwargs={"attn_implementation": "flash_attention_2"} if is_flash_attn_2_available() else {"attn_implementation": "sdpa"},
)

outputs = pipe(
    "<FILE_NAME>",
    chunk_length_s=30,
    batch_size=24,
    return_timestamps=True,
)

outputs

I hope this helps!

(closing this issue for now, feel free to re-open if you have any issues.)

Vaibhavs10 / insanely-fast-whisper

ValueError: Transformers now supports natively BetterTransformer optimizations (torch.nn.functional.scaled_dot_product_attention) for the model type whisper. Please upgrade to transformers>=4.36 and torch>=2.1.1 to use it #151

enable flash attention through pytorch sdpa