2noise / ChatTTS

A generative speech model for daily dialogue.
https://2noise.com
GNU Affero General Public License v3.0
32.62k stars 3.54k forks source link

运行zero shot功能陷入死循环infinite loop occurs while using the zero shot function #673

Closed HUICHIII closed 2 months ago

HUICHIII commented 3 months ago

Issue Description: 我使用zero shot功能,尝试合成音频的时候,程序陷入死循环 代码如下

logger = get_logger("Test #511", lv=logging.WARN)
chat = ChatTTS.Chat(logger)
chat.load(compile=False, source="huggingface")  # Set to True for better performance
texts = [
    "你今天似乎心情不太好,是发生了什么事情吗",
]

params_infer_code = ChatTTS.Chat.InferCodeParams( 
    spk_smp=chat.sample_audio_speaker(load_audio("debug.wav", 24000)),
    txt_smp="就是,我是去唐山那个城市,接我爸的遗体回来的。",
)
wavs = chat.infer(
    texts,
    skip_refine_text=False,
    params_infer_code=params_infer_code,
)

输出如下

/xxx/env/ChatTTS/lib/python3.12/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py:462: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
/xxx/env/ChatTTS/lib/python3.12/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py:647: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
/xxx/env/ChatTTS/lib/python3.12/site-packages/vector_quantize_pytorch/finite_scalar_quantization.py:162: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
/xxx/env/ChatTTS/lib/python3.12/site-packages/torch/cuda/__init__.py:128: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
[+0800 20240807 18:07:27] [WARN] Test #511 | gpu | no GPU found, use CPU instead
/xxx/opensource/ChatTTS/ChatTTS/model/tokenizer.py:24: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  tokenizer: BertTokenizerFast = torch.load(
/xxx/env/ChatTTS/lib/python3.12/site-packages/vector_quantize_pytorch/residual_fsq.py:170: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with autocast(enabled = False):
/xxx/env/ChatTTS/lib/python3.12/site-packages/vector_quantize_pytorch/finite_scalar_quantization.py:192: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with quantization_context():
text:   0%|                                                                                                                                                                                                        | 0/384(max) [00:00, ?it/s]We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
text:   7%|█████████████▏                                                                                                                                                                                     | 26/384(max) [00:01, 19.11it/s]
code:   0%|                                                                                                                                                                                                       | 0/2048(max) [00:00, ?it/s][+0800 20240807 17:27:49] [WARN] Test #511 | gpt | unexpected end at index [0]
code:   0%|                                                                                                                                                                                                       | 0/2048(max) [00:00, ?it/s]
[+0800 20240807 17:27:49] [WARN] Test #511 | gpt | regenerate in order to ensure non-empty
code:   0%|                                                                                                                                                                                                       | 0/2048(max) [00:00, ?it/s][+0800 20240807 17:27:49] [WARN] Test #511 | gpt | unexpected end at index [0]
code:   0%|                                                                                                                                                                                                       | 0/2048(max) [00:00, ?it/s]
[+0800 20240807 17:27:49] [WARN] Test #511 | gpt | regenerate in order to ensure non-empty
code:   0%|                                                                                                                                                                                                       | 0/2048(max) [00:00, ?it/s][+0800 20240807 17:27:50] [WARN] Test #511 | gpt | unexpected end at index [0]
code:   0%|                                                                                                                                                                                                       | 0/2048(max) [00:00, ?it/s]
[+0800 20240807 17:27:50] [WARN] Test #511 | gpt | regenerate in order to ensure non-empty
code:   0%|                                                                                                                                                                                                       | 0/2048(max) [00:00, ?it/s][+0800 20240807 17:27:50] [WARN] Test #511 | gpt | unexpected end at index [0]
code:   0%|                                                                                                                                                                                                       | 0/2048(max) [00:00, ?it/s]
[+0800 20240807 17:27:50] [WARN] Test #511 | gpt | regenerate in order to ensure non-empty
code:   0%|                                                                                                                                                                                                       | 0/2048(max) [00:00, ?it/s][+0800 20240807 17:27:51] [WARN] Test #511 | gpt | unexpected end at index [0]
code:   0%|                                                                                                                                                                                                       | 0/2048(max) [00:00, ?it/s]
[+0800 20240807 17:27:51] [WARN] Test #511 | gpt | regenerate in order to ensure non-empty
code:   0%|                                                                                                                                                                                                       | 0/2048(max) [00:00, ?it/s][+0800 20240807 17:27:51] [WARN] Test #511 | gpt | unexpected end at index [0]
code:   0%|                                                                                                                                                                                                       | 0/2048(max) [00:00, ?it/s]
[+0800 20240807 17:27:51] [WARN] Test #511 | gpt | regenerate in order to ensure non-empty
code:   0%|                                                                                                                                                                                                       | 0/2048(max) [00:00, ?it/s][+0800 20240807 17:27:52] [WARN] Test #511 | gpt | unexpected end at index [0]
code:   0%|                                                                                                                                                                                                       | 0/2048(max) [00:00, ?it/s]
[+0800 20240807 17:27:52] [WARN] Test #511 | gpt | regenerate in order to ensure non-empty
code:   0%|                                                                                                                                                                                                       | 0/2048(max) [00:00, ?it/s][+0800 20240807 17:27:52] [WARN] Test #511 | gpt | unexpected end at index [0]
code:   0%|                                                                                                                                                                                                       | 0/2048(max) [00:00, ?it/s]
[+0800 20240807 17:27:52] [WARN] Test #511 | gpt | regenerate in order to ensure non-empty
code:   0%|                                                                                                                                                                                                       | 0/2048(max) [00:00, ?it/s][+0800 20240807 17:27:53] [WARN] Test #511 | gpt | unexpected end at index [0]
code:   0%|                                                                                                                                                                                                       | 0/2048(max) [00:00, ?it/s]
[+0800 20240807 17:27:53] [WARN] Test #511 | gpt | regenerate in order to ensure non-empty
code:   0%|                                                                                                                                                                                                       | 0/2048(max) [00:00, ?it/s][+0800 20240807 17:27:53] [WARN] Test #511 | gpt | unexpected end at index [0]
code:   0%|                                                                                                                                                                                                       | 0/2048(max) [00:00, ?it/s]
[+0800 20240807 17:27:53] [WARN] Test #511 | gpt | regenerate in order to ensure non-empty
code:   0%|                                                                                                                                                                                                       | 0/2048(max) [00:00, ?it/s][+0800 20240807 17:27:54] [WARN] Test #511 | gpt | unexpected end at index [0]
code:   0%|                                                                                                                                                                                                       | 0/2048(max) [00:00, ?it/s]
[+0800 20240807 17:27:54] [WARN] Test #511 | gpt | regenerate in order to ensure non-empty
code:   0%|                                                                                                                                                                                                       | 0/2048(max) [00:00, ?it/s][+0800 20240807 17:27:54] [WARN] Test #511 | gpt | unexpected end at index [0]
code:   0%|                                                                                                                                                                                                       | 0/2048(max) [00:00, ?it/s]
[+0800 20240807 17:27:54] [WARN] Test #511 | gpt | regenerate in order to ensure non-empty
code:   0%|

上面的log就一直循环输出 我上面使用的debug.wav是采用率为16k的音频,我也尝试过用sox转成24k的音频,还是会出现同样的问题;我也尝试过其他的prompt音频,也会发生同样的问题 另外我为了图方面没有使用GPU,而是用CPU生成音频,不知道这个会不会影响

HUICHIII commented 3 months ago

仔细看了一下代码,这个感觉是和issue#648一样的问题,感觉是输出不了code导致gpt不停重新生成 但是我把temperature调大到0.5,音频截取到8秒,仔细对过音频和转写,还是不停地死循环

fumiama commented 3 months ago

尝试使用最新dev版本

HUICHIII commented 3 months ago

尝试使用最新dev版本

感谢回复

请问您的建议是checkout到dev分支吗,切了之后也是不行

另外捉个虫,dev分支的ChatTTS/model/gpt.py文件中651行和652行之间少了一个参数manual_seed传入

fumiama commented 3 months ago

切了之后也是不行

也许是音频有问题。你可以尝试用webui而非手写代码,并尝试调整各项参数。如果还是不行,就把音频贴出来看看。

github-actions[bot] commented 2 months ago

This issue was closed because it has been inactive for 15 days since being marked as stale.