Vaibhavs10 / insanely-fast-whisper

Apache License 2.0
7.49k stars 527 forks source link

`The operator 'aten::isin.Tensor_Tensor_out' is not currently implemented for the MPS device.` while running on macOS #225

Open paulz opened 4 months ago

paulz commented 4 months ago
NotImplementedError:
The operator 'aten::isin.Tensor_Tensor_out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

and using PYTORCH_ENABLE_MPS_FALLBACK=1 cause very slow performance:

PYTORCH_ENABLE_MPS_FALLBACK=1 insanely-fast-whisper --device-id mps --file-name long-audio.mp3 Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 🤗 Transcribing... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0:01:21 Voila!✨ Your file has been transcribed go check it out over here 👉 output.json

JustinGuese commented 4 months ago

same

mertbozkir commented 4 months ago

Same for me, how can I solve this? @Vaibhavs10

piercecohen1 commented 3 months ago

+1

flaviodelgrosso commented 3 months ago

Running this: pip install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpushould solve the issue, but the following error occurs:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. pyannote-audio 3.2.0 requires torchaudio>=2.2.0, but you have torchaudio 2.2.0.dev20240529 which is incompatible.

Vaibhavs10 commented 3 months ago

Sorry for the delay in responding to this, given the current constraints of requirements, AFAIK we'll need to wait for the next stable torch version release (which should be soon).

BaronZack commented 3 months ago

On pytorch nightly build 2.4.0, this bug is fixed. try uninstall torch and reinstall with pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu

adamjgrant commented 2 months ago

I ran

pip uninstall torch torchvision torchaudio
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu

And reattempted but still encountered this error

harlantwood commented 2 months ago

I got this working within miniconda ( https://docs.anaconda.com/miniconda/miniconda-install/ ) --

conda create -n insane-whisper python=3.12 -y
conda activate insane-whisper
pip3 uninstall torch torchvision torchaudio
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
pip3 install insanely-fast-whisper

Now it works well within this conda env. Note that even though I also still have it installed outside of conda, within conda the correct version will be run:

$ which -a insanely-fast-whisper                                                                                                                                                                  
~/miniconda3/envs/insane-whisper/bin/insanely-fast-whisper
~/.local/bin/insanely-fast-whisper
joliss commented 1 month ago

This error used to happen for me on macOS, but I just retried it, and it seems to work fine now. I was/am running the following command:

$ insanely-fast-whisper --device-id mps --file-name foo.wav
🤗 Transcribing... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0:00:00You have passed task=transcribe, but also have set `forced_decoder_ids` to [[1, None], [2, 50360]] which creates a conflict. `forced_decoder_ids` will be ignored in favor of task=transcribe.
🤗 Transcribing... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0:00:01Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.43.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.
🤗 Transcribing... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0:00:01The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
🤗 Transcribing... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0:00:12
Voila!✨ Your file has been transcribed go check it out over here 👉 output.json

I'm running insanely-fast-whisper 0.0.15, but I'm not sure what version I was running when it failed.

That said, while it is indeed running on the GPU, it's still slightly slower than whisper.cpp with large-v3. Not sure if that means that something's going wrong.