Closed waylonwang closed 4 months ago
输出的AUDIO,如果输入到ComfyUI-VideoHelperSuite的Audio to legacy VHS_AUDIO中会报错,我查了下nodes.py,最后的输出只是简单的使用了python列表,而不是3D Tensor格式: audio = {"waveform": [output['tts_speech']],"sample_rate":target_sr}
调试CosyVoice-ComfyUI输出的内容为: ${'waveform': [tensor([[1.5498e-05, 7.1867e-06, 8.9236e-06, ..., 8.0358e-03, 9.0343e-03, 9.4499e-03]])], 'sample_rate': 22050}
而使用ComfyUI-VideoHelperSuite的LoadAudio加载音频输出的内容为: ${'waveform': tensor([[[0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 3.0518e-05, 6.1035e-05, 1.5259e-04]]]), 'sample_rate': 22050}
并且使用ComfyUI自带的LoadAudio加载音频输出的内容也为: ${'waveform': tensor([[[0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 3.0518e-05, 6.1035e-05, 1.5259e-04]]]), 'sample_rate': 22050}
问了下ChatGPT:
因此,建议CosyVoice-ComfyUI将audio的输出改为3D Tensor格式以提高兼容性.
这样转换一下即可: audio = {"waveform": torch.stack([output['tts_speech']]),"sample_rate":target_sr}
感谢建议,已修改
输出的AUDIO,如果输入到ComfyUI-VideoHelperSuite的Audio to legacy VHS_AUDIO中会报错,我查了下nodes.py,最后的输出只是简单的使用了python列表,而不是3D Tensor格式: audio = {"waveform": [output['tts_speech']],"sample_rate":target_sr}
调试CosyVoice-ComfyUI输出的内容为: ${'waveform': [tensor([[1.5498e-05, 7.1867e-06, 8.9236e-06, ..., 8.0358e-03, 9.0343e-03, 9.4499e-03]])], 'sample_rate': 22050}
而使用ComfyUI-VideoHelperSuite的LoadAudio加载音频输出的内容为: ${'waveform': tensor([[[0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 3.0518e-05, 6.1035e-05, 1.5259e-04]]]), 'sample_rate': 22050}
并且使用ComfyUI自带的LoadAudio加载音频输出的内容也为: ${'waveform': tensor([[[0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 3.0518e-05, 6.1035e-05, 1.5259e-04]]]), 'sample_rate': 22050}
问了下ChatGPT:
因此,建议CosyVoice-ComfyUI将audio的输出改为3D Tensor格式以提高兼容性.