coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
35.28k stars 4.3k forks source link

[Bug] AttributeError: 'int' object has no attribute 'device' #3996

Open CrackerHax opened 1 month ago

CrackerHax commented 1 month ago

Describe the bug

example code gives error when saving.

To Reproduce

import os
import time
import torch
import torchaudio
from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts

print("Loading model...")
config = XttsConfig()
config.load_json("/path/to/xtts/config.json")
model = Xtts.init_from_config(config)
model.load_checkpoint(config, checkpoint_dir="/path/to/xtts/", use_deepspeed=True)
model.cuda()

print("Computing speaker latents...")
gpt_cond_latent, speaker_embedding = model.get_conditioning_latents(audio_path=["reference.wav"])

print("Inference...")
t0 = time.time()
chunks = model.inference_stream(
    "It took me quite a long time to develop a voice and now that I have it I am not going to be silent.",
    "en",
    gpt_cond_latent,
    speaker_embedding
)

wav_chuncks = []
for i, chunk in enumerate(chunks):
    if i == 0:
        print(f"Time to first chunck: {time.time() - t0}")
    print(f"Received chunk {i} of audio length {chunk.shape[-1]}")
    wav_chuncks.append(chunk)
wav = torch.cat(wav_chuncks, dim=0)
torchaudio.save("xtts_streaming.wav", wav.squeeze().unsqueeze(0).cpu(), 24000)

Expected behavior

expect it to save a wav file

Logs

Traceback (most recent call last):

    if elements.device.type == "mps" and not is_torch_greater_or_equal_than_2_4:
AttributeError: 'int' object has no attribute 'device'

Environment

{
    "CUDA": {
        "GPU": [
            "NVIDIA GeForce RTX 3080",
            "NVIDIA GeForce RTX 3080"
        ],
        "available": true,
        "version": "12.4"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.4.1+cu124",
        "TTS": "0.22.0",
        "numpy": "1.22.0"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.10.14",
        "version": "#1 SMP Thu Jan 11 04:09:03 UTC 2024"
    }
}

Additional context

AttributeError: 'int' object has no attribute 'device'

eginhard commented 1 month ago

Try using our fork (available via pip install coqui-tts). This repo is not updated anymore and the streaming code here doesn't work with recent versions of transformers - that's probably the reason, but hard to tell because you didn't include the full error log.