I installed all necessary drivers and packages, including nvcc to build Flash Attention, all smooth sailing. While everything works fine, when I try to use
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda')
However, the transcription still works, but - so I assume - without Flash.
I tried it on Ubuntu 22.04.4 LTS on 24 Cores 32GB + GTX4090 24GB and A100 80GB.
What did I miss?
PS: Maybe I should add that this is an awesome project. Thank you.
I installed all necessary drivers and packages, including nvcc to build Flash Attention, all smooth sailing. While everything works fine, when I try to use
insanely-fast-whisper --file-name audio.ogg --flash True
I get the following warning:
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda')
However, the transcription still works, but - so I assume - without Flash.
I tried it on Ubuntu 22.04.4 LTS on 24 Cores 32GB + GTX4090 24GB and A100 80GB.
What did I miss?
PS: Maybe I should add that this is an awesome project. Thank you.