coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
35.35k stars 4.31k forks source link

[Bug] Input and parameter tensors are not at the same device. How to point the input tensor to cuda:2? #3985

Open yiouyou opened 2 months ago

yiouyou commented 2 months ago

Describe the bug

Code:

from TTS.api import TTS
tts = TTS(model_name="voice_conversion_models/multilingual/vctk/freevc24", progress_bar=False).to("cuda:2")
tts.voice_conversion_to_file(source_wav="_t1_source.wav", target_wav="_t1_target.wav", file_path="_t1.wav")

Error:

(tts) songz:~/TTS$ python _t1.py 
 > voice_conversion_models/multilingual/vctk/freevc24 is already downloaded.
 > Using model: freevc
 > Loading pretrained speaker encoder model ...
/home/songz/TTS/TTS/utils/io.py:51: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  return torch.load(f, map_location=map_location, **kwargs)
Loaded the voice encoder model on cuda in 0.75 seconds.
/home/songz/TTS/TTS/vc/modules/freevc/wavlm/__init__.py:26: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  checkpoint = torch.load(output_path, map_location=torch.device(device))
/home/songz/TTS/TTS/utils/io.py:54: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  return torch.load(f, map_location=map_location, **kwargs)
Traceback (most recent call last):
  File "/home/songz/TTS/__t1.py", line 6, in <module>
    tts.voice_conversion_to_file(source_wav="_t1_source.wav", target_wav="_t1_target.wav", file_path="_t1.wav")
  File "/home/songz/TTS/TTS/api.py", line 377, in voice_conversion_to_file
    wav = self.voice_conversion(source_wav=source_wav, target_wav=target_wav)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/songz/TTS/TTS/api.py", line 358, in voice_conversion
    wav = self.voice_converter.voice_conversion(source_wav=source_wav, target_wav=target_wav)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/songz/TTS/TTS/utils/synthesizer.py", line 254, in voice_conversion
    output_wav = self.vc_model.voice_conversion(source_wav, target_wav)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/songz/miniconda3/envs/tts/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/songz/TTS/TTS/vc/models/freevc.py", line 522, in voice_conversion
    g_tgt = self.enc_spk_ex.embed_utterance(wav_tgt)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/songz/TTS/TTS/vc/modules/freevc/speaker_encoder/speaker_encoder.py", line 155, in embed_utterance
    partial_embeds = self(mels).cpu().numpy()
                     ^^^^^^^^^^
  File "/home/songz/miniconda3/envs/tts/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/songz/miniconda3/envs/tts/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/songz/TTS/TTS/vc/modules/freevc/speaker_encoder/speaker_encoder.py", line 60, in forward
    _, (hidden, _) = self.lstm(mels)
                     ^^^^^^^^^^^^^^^
  File "/home/songz/miniconda3/envs/tts/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/songz/miniconda3/envs/tts/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/songz/miniconda3/envs/tts/lib/python3.11/site-packages/torch/nn/modules/rnn.py", line 917, in forward
    result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Input and parameter tensors are not at the same device, found input tensor at cuda:0 and parameter tensor at cuda:2

Question: How to move input tensor to cuda:2?

To Reproduce

python _t1.py

Expected behavior

No response

Logs

No response

Environment

{
    "CUDA": {
        "GPU": [
            "NVIDIA A100-SXM4-40GB",
            "NVIDIA A100-SXM4-40GB",
            "NVIDIA A100-SXM4-40GB"
        ],
        "available": true,
        "version": "12.1"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.4.0+cu121",
        "TTS": "0.22.0",
        "numpy": "1.26.4"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.11.9",
        "version": "#29~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Apr  4 14:39:20 UTC 2"
    }
}

Additional context

No response

isatyamks commented 1 month ago

@yiouyou can You assign this issue to me ?

isatyamks commented 1 month ago

from TTS.api import TTS tts = TTS(model_name="voice_conversion_models/multilingual/vctk/freevc24", progress_bar=False).to("cuda:2") tts.voice_conversion_to_file(source_wav="_t1_source.wav", target_wav="_t1_target.wav", file_path="_t1.wav")

@yiouyou can you please share the file directory for this code block?

andrea-mucci commented 1 month ago

i have a similar error:

# the first audio is generated with a clone voice
path = self.model.tts_to_file(text=text, speaker_wav=speaker_wav, language=language,
                                      file_path=f"/tmp/output_{output_random}.wav")
# I got the audfio generated with the text to speech and i force to be converted with the speacker_wav
# the target is the path variable and the source is speacker_wav
self.conversion.voice_conversion_to_file(path, speaker_wav, file_path=new_output_path)
stale[bot] commented 2 days ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.