You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.

I am using wsl. i have cuda 12.4 installed. I did: pipx runpip insanely-fast-whisper install flash-attn --no-build-isolation

command insanely-fast-whisper --model-name "openai/whisper-large-v3-turbo" --flash True --file-name test.mp3. I got the same thing when i did not provide the model name.

I expected the model to be on the GPU? How do i fix this: You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU withmodel.to('cuda'). thank you.

Vaibhavs10 / insanely-fast-whisper

You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. #253