coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
33.99k stars 4.13k forks source link

[Bug] Unable to use xtts_v2 with mps device on Apple Silicon #3649

Closed vesper8 closed 3 months ago

vesper8 commented 6 months ago

Describe the bug

I have a M1 Max with 32 cores and 64 gb of unified memory. So if MPS is meant to work, it should work quite fast. But currently it doesn't work at all, it just hangs.

I'm running it this way: tts --device mps --model_name "tts_models/multilingual/multi-dataset/xtts_v2" --speaker_idx 'Daisy Studious' --language_idx en --text "Hello world" --out_path ./output/speech.wav

If I change the device to cpu then it works just fine and rather quickly. If I change it to mps, it just hangs until I Ctrl-C

I also have this info:

PyTorch version: 2.2.2
Is MPS (Metal Performance Shader) built? True
Is MPS available? True
Using device: mps

Am I missing something?

Environment

{
    "CUDA": {
        "GPU": [],
        "available": false,
        "version": null
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.2.2",
        "TTS": "0.22.0",
        "numpy": "1.22.0"
    },
    "System": {
        "OS": "Darwin",
        "architecture": [
            "64bit",
            ""
        ],
        "processor": "arm",
        "python": "3.10.10",
        "version": "Darwin Kernel Version 23.4.0: Fri Mar 15 00:10:42 PDT 2024; root:xnu-10063.101.17~1/RELEASE_ARM64_T6000"
    }
}
stale[bot] commented 4 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

Panquesito0 commented 5 days ago

Hey. I've been searching for a way to use mps, but this is what I found. Using the following Python code:

import torch
from TTS.api import TTS
AUDIO_INPUT = './Input.mp3'
AUDIO_OUTPUT = './Output.wav'
TEXT = '''Hello'''
# Get device
device = "cuda" if torch.cuda.is_available() else 'mps' # "mps"
# Init TTS
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device)
# Text to speech to a Tile
tts.tts_to_file(text=TEXT, speaker_wav=AUDIO_INPUT,language="en",file_path=AUDIO_OUTPUT) 

The next message is shown: The operator 'aten::_fft_r2c' is not currently implemented for the MPS device.

After searching for this error message on internet I've found this https://github.com/pytorch/pytorch/issues/116392. It appears to be more related to pytorch than to TTS itself. In the link it is mentioned that they are working to implement the use of MPS on this models. So I guess we have to wait.