RuntimeError: stft input and window must be on the same device but got self on cpu and window on cuda:0

Strive-for-excellence commented 1 month ago

When I use the zero-shot feature on the web version, I get an error.

To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
  File "/mnt/cache/zhangxingyan/env/chat/lib/python3.10/site-packages/gradio/queueing.py", line 536, in process_events
    response = await route_utils.call_process_api(
  File "/mnt/cache/zhangxingyan/env/chat/lib/python3.10/site-packages/gradio/route_utils.py", line 276, in call_process_api
    output = await app.get_blocks().process_api(
  File "/mnt/cache/zhangxingyan/env/chat/lib/python3.10/site-packages/gradio/blocks.py", line 1897, in process_api
    result = await self.call_function(
  File "/mnt/cache/zhangxingyan/env/chat/lib/python3.10/site-packages/gradio/blocks.py", line 1483, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/mnt/cache/zhangxingyan/env/chat/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/mnt/cache/zhangxingyan/env/chat/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
  File "/mnt/cache/zhangxingyan/env/chat/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 859, in run
    result = context.run(func, *args)
  File "/mnt/cache/zhangxingyan/env/chat/lib/python3.10/site-packages/gradio/utils.py", line 816, in wrapper
    response = f(*args, **kwargs)
  File "/mnt/cache/zhangxingyan/gitlab/chat/ChatTTS/examples/web/funcs.py", line 121, in on_upload_sample_audio
    spk_smp = chat.sample_audio_speaker(sample_audio)
  File "/mnt/cache/zhangxingyan/env/chat/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/cache/zhangxingyan/gitlab/chat/ChatTTS/ChatTTS/core.py", line 167, in sample_audio_speaker
    return self.tokenizer._encode_prompt(self.dvae(wav, "encode").squeeze_(0))
  File "/mnt/cache/zhangxingyan/gitlab/chat/ChatTTS/ChatTTS/model/dvae.py", line 248, in __call__
    return super().__call__(inp, mode)
  File "/mnt/cache/zhangxingyan/env/chat/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/cache/zhangxingyan/env/chat/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/cache/zhangxingyan/env/chat/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/cache/zhangxingyan/gitlab/chat/ChatTTS/ChatTTS/model/dvae.py", line 255, in forward
    mel = self.preprocessor_mel(inp)
  File "/mnt/cache/zhangxingyan/gitlab/chat/ChatTTS/ChatTTS/model/dvae.py", line 197, in __call__
    return super().__call__(audio)
  File "/mnt/cache/zhangxingyan/env/chat/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/cache/zhangxingyan/env/chat/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/cache/zhangxingyan/gitlab/chat/ChatTTS/ChatTTS/model/dvae.py", line 200, in forward
    mel: torch.Tensor = self.mel_spec(audio)
  File "/mnt/cache/zhangxingyan/env/chat/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/cache/zhangxingyan/env/chat/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/cache/zhangxingyan/env/chat/lib/python3.10/site-packages/torchaudio/transforms/_transforms.py", line 619, in forward
    specgram = self.spectrogram(waveform)
  File "/mnt/cache/zhangxingyan/env/chat/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/cache/zhangxingyan/env/chat/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/cache/zhangxingyan/env/chat/lib/python3.10/site-packages/torchaudio/transforms/_transforms.py", line 110, in forward
    return F.spectrogram(
  File "/mnt/cache/zhangxingyan/env/chat/lib/python3.10/site-packages/torchaudio/functional/functional.py", line 126, in spectrogram
    spec_f = torch.stft(
  File "/mnt/cache/zhangxingyan/env/chat/lib/python3.10/site-packages/torch/functional.py", line 665, in stft
    return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
RuntimeError: stft input and window must be on the same device but got self on cpu and window on cuda:0

Maybe this will work ?

    @torch.inference_mode()
    def sample_audio_speaker(self, wav: Union[np.ndarray, torch.Tensor]) -> str:
        if isinstance(wav, np.ndarray):
            wav = torch.from_numpy(wav)
-        return self.tokenizer._encode_prompt(self.dvae(wav, "encode").squeeze_(0))
+
+        return self.tokenizer._encode_prompt(self.dvae(wav.to(self.std.device), "encode").squeeze_(0).cpu())

fumiama commented 1 month ago

try the latest dev.

CaptainDP commented 1 week ago

dev环境，这个问题还存在：代码版本： commit 9f9abeccac43f13c7b60aef85b9bb20ee8f0ed9e (HEAD, origin/dev) Author: github-actions[bot] 41898282+github-actions[bot]@users.noreply.github.com Date: Sun Aug 25 20:22:51 2024 +0800

问题描述： File "/root/miniconda3/lib/python3.10/site-packages/torch/functional.py", line 650, in stft return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined] RuntimeError: stft input and window must be on the same device but got self on cpu and window on cuda:0

2noise / ChatTTS

RuntimeError: stft input and window must be on the same device but got self on cpu and window on cuda:0 #603