2noise / ChatTTS

A generative speech model for daily dialogue.
https://2noise.com
GNU Affero General Public License v3.0
32.18k stars 3.49k forks source link

RuntimeError: Input Tensor has to be 2D. #635

Closed mysteryX1 closed 2 months ago

mysteryX1 commented 3 months ago

您好,在运行demo程序中碰到如下问题,请教以下如何解决。我的电脑配置是ubuntu22.04,cuda版本是11.8,安装的torch版本为2.3.0,torchvision==0.18.0,torchaudio==2.3.0 运行报错的具体报错情况如下:

found invalid characters: {'1'}
found invalid characters: {'2'}
text:   0%|                                                                                                                                                                                                              | 0/384(max) [00:00, ?it/s]We detected that you are passing past_key_values as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate Cache class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
text:   3%|█████▏                                                                                                                                                                                                   | 10/384(max) [00:00, 39.61it/s]
code:   4%|███████▋                                                                                                                                                                                               | 79/2048(max) [00:00, 114.92it/s]
/home/wwd/anaconda3/envs/chatTTS/lib/python3.9/site-packages/torchaudio/_backend/ffmpeg.py:245: UserWarning: The use of x.T on tensors of dimension other than 2 to reverse their shape is deprecated and it will throw an error in a future release. Consider x.mT to transpose batches of matrices or x.permute(*torch.arange(x.ndim - 1, -1, -1)) to reverse the dimensions of a tensor. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3675.)
  src = src.T
Traceback (most recent call last):
  File "/home/wwd/cza/GPT4-RobotDog-main/chatTTS.py", line 13, in <module>
    torchaudio.save(f"basic_output{i}.wav", torch.from_numpy(wavs[i]).unsqueeze(0), 24000)
  File "/home/wwd/anaconda3/envs/chatTTS/lib/python3.9/site-packages/torchaudio/_backend/utils.py", line 313, in save
    return backend.save(
  File "/home/wwd/anaconda3/envs/chatTTS/lib/python3.9/site-packages/torchaudio/_backend/ffmpeg.py", line 316, in save
    save_audio(
  File "/home/wwd/anaconda3/envs/chatTTS/lib/python3.9/site-packages/torchaudio/_backend/ffmpeg.py", line 257, in save_audio
    s.write_audio_chunk(0, src)
  File "/home/wwd/anaconda3/envs/chatTTS/lib/python3.9/site-packages/torio/io/_streaming_media_encoder.py", line 469, in write_audio_chunk
    self._s.write_audio_chunk(i, chunk, pts)
RuntimeError: Input Tensor has to be 2D.
Exception raised from validate_audio_input at /__w/audio/audio/pytorch/audio/src/libtorio/ffmpeg/stream_writer/tensor_converter.cpp:31 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7feacaed0897 in /home/wwd/anaconda3/envs/chatTTS/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) + 0x68 (0x7feacae80bee in /home/wwd/anaconda3/envs/chatTTS/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x57457 (0x7fea12185457 in /home/wwd/anaconda3/envs/chatTTS/lib/python3.9/site-packages/torio/lib/libtorio_ffmpeg4.so)
frame #3: <unknown function> + 0x57e0c (0x7fea12185e0c in /home/wwd/anaconda3/envs/chatTTS/lib/python3.9/site-packages/torio/lib/libtorio_ffmpeg4.so)
frame #4: torio::io::TensorConverter::convert(at::Tensor const&) + 0x33 (0x7fea12187a33 in /home/wwd/anaconda3/envs/chatTTS/lib/python3.9/site-packages/torio/lib/libtorio_ffmpeg4.so)
frame #5: torio::io::EncodeProcess::process(at::Tensor const&, std::optional<double> const&) + 0xbe (0x7fea1217692e in /home/wwd/anaconda3/envs/chatTTS/lib/python3.9/site-packages/torio/lib/libtorio_ffmpeg4.so)
frame #6: torio::io::StreamingMediaEncoder::write_audio_chunk(int, at::Tensor const&, std::optional<double> const&) + 0xa5 (0x7fea121820c5 in /home/wwd/anaconda3/envs/chatTTS/lib/python3.9/site-packages/torio/lib/libtorio_ffmpeg4.so)
frame #7: <unknown function> + 0x39f16 (0x7fea0f962f16 in /home/wwd/anaconda3/envs/chatTTS/lib/python3.9/site-packages/torio/lib/_torio_ffmpeg4.so)
frame #8: <unknown function> + 0x326a5 (0x7fea0f95b6a5 in /home/wwd/anaconda3/envs/chatTTS/lib/python3.9/site-packages/torio/lib/_torio_ffmpeg4.so)
frame #9: <unknown function> + 0x14e6b6 (0x558bb618c6b6 in /home/wwd/anaconda3/envs/chatTTS/bin/python)
frame #10: _PyObject_MakeTpCall + 0x2ec (0x558bb61757ac in /home/wwd/anaconda3/envs/chatTTS/bin/python)
frame #11: <unknown function> + 0x14cc3e (0x558bb618ac3e in /home/wwd/anaconda3/envs/chatTTS/bin/python)
frame #12: _PyEval_EvalFrameDefault + 0x4abe (0x558bb6171a2e in /home/wwd/anaconda3/envs/chatTTS/bin/python)
frame #13: <unknown function> + 0x12e184 (0x558bb616c184 in /home/wwd/anaconda3/envs/chatTTS/bin/python)
frame #14: _PyFunction_Vectorcall + 0xd9 (0x558bb617d559 in /home/wwd/anaconda3/envs/chatTTS/bin/python)
frame #15: _PyEval_EvalFrameDefault + 0x696 (0x558bb616d606 in /home/wwd/anaconda3/envs/chatTTS/bin/python)
frame #16: <unknown function> + 0x12e184 (0x558bb616c184 in /home/wwd/anaconda3/envs/chatTTS/bin/python)
frame #17: _PyFunction_Vectorcall + 0xd9 (0x558bb617d559 in /home/wwd/anaconda3/envs/chatTTS/bin/python)
frame #18: _PyEval_EvalFrameDefault + 0x3e2 (0x558bb616d352 in /home/wwd/anaconda3/envs/chatTTS/bin/python)
frame #19: <unknown function> + 0x12e184 (0x558bb616c184 in /home/wwd/anaconda3/envs/chatTTS/bin/python)
frame #20: _PyFunction_Vectorcall + 0xd9 (0x558bb617d559 in /home/wwd/anaconda3/envs/chatTTS/bin/python)
frame #21: _PyEval_EvalFrameDefault + 0x4abe (0x558bb6171a2e in /home/wwd/anaconda3/envs/chatTTS/bin/python)
frame #22: <unknown function> + 0x12e184 (0x558bb616c184 in /home/wwd/anaconda3/envs/chatTTS/bin/python)
frame #23: _PyFunction_Vectorcall + 0xd9 (0x558bb617d559 in /home/wwd/anaconda3/envs/chatTTS/bin/python)
frame #24: _PyEval_EvalFrameDefault + 0x4abe (0x558bb6171a2e in /home/wwd/anaconda3/envs/chatTTS/bin/python)
frame #25: <unknown function> + 0x12e184 (0x558bb616c184 in /home/wwd/anaconda3/envs/chatTTS/bin/python)
frame #26: _PyEval_EvalCodeWithName + 0x48 (0x558bb616be58 in /home/wwd/anaconda3/envs/chatTTS/bin/python)
frame #27: PyEval_EvalCodeEx + 0x39 (0x558bb616be09 in /home/wwd/anaconda3/envs/chatTTS/bin/python)
frame #28: PyEval_EvalCode + 0x1b (0x558bb62194ab in /home/wwd/anaconda3/envs/chatTTS/bin/python)
frame #29: <unknown function> + 0x20887a (0x558bb624687a in /home/wwd/anaconda3/envs/chatTTS/bin/python)
frame #30: <unknown function> + 0x204c03 (0x558bb6242c03 in /home/wwd/anaconda3/envs/chatTTS/bin/python)
frame #31: <unknown function> + 0x98175 (0x558bb60d6175 in /home/wwd/anaconda3/envs/chatTTS/bin/python)
frame #32: PyRun_SimpleFileExFlags + 0x1b1 (0x558bb623c841 in /home/wwd/anaconda3/envs/chatTTS/bin/python)
frame #33: Py_RunMain + 0x38d (0x558bb6239c6d in /home/wwd/anaconda3/envs/chatTTS/bin/python)
frame #34: Py_BytesMain + 0x37 (0x558bb620d2e7 in /home/wwd/anaconda3/envs/chatTTS/bin/python)
frame #35: <unknown function> + 0x29d90 (0x7feb1d429d90 in /lib/x86_64-linux-gnu/libc.so.6)
frame #36: __libc_start_main + 0x80 (0x7feb1d429e40 in /lib/x86_64-linux-gnu/libc.so.6)
frame #37: <unknown function> + 0x1cf1e1 (0x558bb620d1e1 in /home/wwd/anaconda3/envs/chatTTS/bin/python)
menphix-watanabe commented 3 months ago

去掉.unsqueeze(0)就OK了

fumiama commented 3 months ago

与 #621 重复。

mysteryX1 commented 3 months ago

我现在碰到与#599 一样的问题了

We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
text:   3%|█████▏                                                                                                                                                                             | 11/384(max) [01:24,  7.65s/it]
code:   4%|██████▎                                                                                                                                                                           | 73/2048(max) [00:18,  3.92it/s]

对于这个问题我可以怎么修改,我已经试过将compile=false 以及compile=True都测试过,都会因为这个问题导致文本加载3%,code加载4%的时候程序退出来了,chatgpt给出的回答是需要在源码里面找到使用 transformers 模型的地方,并替换相应的 past_key_values 参数为 Cache 类。想再请教一下应该怎么操作

fumiama commented 3 months ago

我现在碰到与 #599 一样的问题了

那个警告并非问题的原因,是可以忽略的。问题原因还在他处,建议新开一个issue并贴出详细报错。

pustar commented 3 months ago

多谢 @menphix-watanabe 这样能正常跑通:

import ChatTTS
import torch
import torchaudio

chat = ChatTTS.Chat()
custom_path = './pretrained_models/chatTTS/'
device = 'cuda'
chat.load(source='custom', custom_path=custom_path, device=device, compile=False) # Set to True for better performance
texts = ["PUT YOUR 1st TEXT HERE", "PUT YOUR 2nd TEXT HERE"]

wavs = chat.infer(texts)

for i in range(len(wavs)):
    torchaudio.save(f"basic_output{i}.wav", torch.from_numpy(wavs[i]), 24000)