Getting `Use model.to('Cuda')` when trying to use Flash Attention

I installed all necessary drivers and packages, including nvcc to build Flash Attention, all smooth sailing. While everything works fine, when I try to use

insanely-fast-whisper --file-name audio.ogg --flash True

I get the following warning:

You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda')

However, the transcription still works, but - so I assume - without Flash.

I tried it on Ubuntu 22.04.4 LTS on 24 Cores 32GB + GTX4090 24GB and A100 80GB.

What did I miss?

PS: Maybe I should add that this is an awesome project. Thank you.

Vaibhavs10 / insanely-fast-whisper

Getting `Use model.to('Cuda')` when trying to use Flash Attention #210