AIFSH / CosyVoice-ComfyUI

a comfyui custom node for CosyVoice
Apache License 2.0
126 stars 12 forks source link

audio输出格式的兼容性问题 #15

Closed waylonwang closed 2 months ago

waylonwang commented 2 months ago

输出的AUDIO,如果输入到ComfyUI-VideoHelperSuite的Audio to legacy VHS_AUDIO中会报错,我查了下nodes.py,最后的输出只是简单的使用了python列表,而不是3D Tensor格式: audio = {"waveform": [output['tts_speech']],"sample_rate":target_sr}

调试CosyVoice-ComfyUI输出的内容为: ${'waveform': [tensor([[1.5498e-05, 7.1867e-06, 8.9236e-06, ..., 8.0358e-03, 9.0343e-03, 9.4499e-03]])], 'sample_rate': 22050}

而使用ComfyUI-VideoHelperSuite的LoadAudio加载音频输出的内容为: ${'waveform': tensor([[[0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 3.0518e-05, 6.1035e-05, 1.5259e-04]]]), 'sample_rate': 22050}

并且使用ComfyUI自带的LoadAudio加载音频输出的内容也为: ${'waveform': tensor([[[0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 3.0518e-05, 6.1035e-05, 1.5259e-04]]]), 'sample_rate': 22050}

问了下ChatGPT:

因此,建议CosyVoice-ComfyUI将audio的输出改为3D Tensor格式以提高兼容性.

waylonwang commented 2 months ago

这样转换一下即可: audio = {"waveform": torch.stack([output['tts_speech']]),"sample_rate":target_sr}

AIFSH commented 2 months ago

感谢建议,已修改