FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
https://funaudiollm.github.io/
Apache License 2.0
6.12k stars 657 forks source link

NotImplementedError: Could not run 'aten::empty_strided' with arguments from the 'CUDA' backend. #366

Open alkaidzone opened 2 months ago

alkaidzone commented 2 months ago

when running examples in Basic Usage, it shows errors:


2024-09-07 20:09:29,033 - modelscope - INFO - PyTorch version 2.0.1 Found.
2024-09-07 20:09:29,033 - modelscope - INFO - Loading ast index from /Users/jiangwenjing/.cache/modelscope/ast_indexer
2024-09-07 20:09:29,135 - modelscope - INFO - Loading done! Current index file version is 1.15.0, with md5 65db473890940e3550f281ba3a7b9944 and a total number of 980 components indexed
failed to import ttsfrd, use WeTextProcessing instead
/opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/diffusers/models/lora.py:393: FutureWarning: `LoRACompatibleLinear` is deprecated and will be removed in version 1.0.0. Use of `LoRACompatibleLinear` is deprecated. Please switch to PEFT backend by installing PEFT: `pip install peft`.
  deprecate("LoRACompatibleLinear", "1.0.0", deprecation_message)
2024-09-07 20:09:34,490 INFO input frame rate=50
2024-09-07 20:09:37,560 WETEXT INFO found existing fst: /opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/tn/zh_tn_tagger.fst
2024-09-07 20:09:37,560 INFO found existing fst: /opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/tn/zh_tn_tagger.fst
2024-09-07 20:09:37,560 WETEXT INFO                     /opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/tn/zh_tn_verbalizer.fst
2024-09-07 20:09:37,560 INFO                     /opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/tn/zh_tn_verbalizer.fst
2024-09-07 20:09:37,560 WETEXT INFO skip building fst for zh_normalizer ...
2024-09-07 20:09:37,560 INFO skip building fst for zh_normalizer ...
2024-09-07 20:09:37,789 WETEXT INFO found existing fst: /opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/tn/en_tn_tagger.fst
2024-09-07 20:09:37,789 INFO found existing fst: /opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/tn/en_tn_tagger.fst
2024-09-07 20:09:37,789 WETEXT INFO                     /opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/tn/en_tn_verbalizer.fst
2024-09-07 20:09:37,789 INFO                     /opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/tn/en_tn_verbalizer.fst
2024-09-07 20:09:37,789 WETEXT INFO skip building fst for en_normalizer ...
2024-09-07 20:09:37,789 INFO skip building fst for en_normalizer ...
Traceback (most recent call last):
  File "test.py", line 5, in <module>
    cosyvoice = CosyVoice('pretrained_models/CosyVoice-300M-SFT')
  File "/Users/jiangwenjing/cs-records/NUS/dl/cosyvoice/CosyVoice/cosyvoice/cli/cosyvoice.py", line 45, in __init__
    self.model.load_jit('{}/llm.text_encoder.fp16.zip'.format(model_dir),
  File "/Users/jiangwenjing/cs-records/NUS/dl/cosyvoice/CosyVoice/cosyvoice/cli/model.py", line 63, in load_jit
    llm_text_encoder = torch.jit.load(llm_text_encoder_model)
  File "/opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/torch/jit/_serialization.py", line 162, in load
    cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files, _restore_shapes)  # type: ignore[call-arg]
NotImplementedError: Could not run 'aten::empty_strided' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes  for possible resolutions. 'aten::empty_strided' is only available for these backends: [CPU, MPS, Meta, QuantizedCPU, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMeta, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradNestedTensor, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PythonDispatcher].

CPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterCPU.cpp:31034 [kernel]
MPS: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterMPS.cpp:22748 [kernel]
Meta: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterMeta.cpp:26824 [kernel]
QuantizedCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterQuantizedCPU.cpp:929 [kernel]
BackendSelect: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterBackendSelect.cpp:726 [kernel]
Python: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:144 [backend fallback]
FuncTorchDynamicLayerBackMode: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/DynamicLayer.cpp:491 [backend fallback]
Functionalize: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/FunctionalizeFallbackKernel.cpp:280 [backend fallback]
Named: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ConjugateFallback.cpp:21 [kernel]
Negative: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/NegateFallback.cpp:23 [kernel]
ZeroTensor: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ZeroTensorFallback.cpp:90 [kernel]
ADInplaceOrView: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:63 [backend fallback]
AutogradOther: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17484 [autograd kernel]
AutogradCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17484 [autograd kernel]
AutogradCUDA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17484 [autograd kernel]
AutogradHIP: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17484 [autograd kernel]
AutogradXLA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17484 [autograd kernel]
AutogradMPS: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17484 [autograd kernel]
AutogradIPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17484 [autograd kernel]
AutogradXPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17484 [autograd kernel]
AutogradHPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17484 [autograd kernel]
AutogradVE: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17484 [autograd kernel]
AutogradLazy: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17484 [autograd kernel]
AutogradMeta: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17484 [autograd kernel]
AutogradMTIA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17484 [autograd kernel]
AutogradPrivateUse1: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17484 [autograd kernel]
AutogradPrivateUse2: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17484 [autograd kernel]
AutogradPrivateUse3: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17484 [autograd kernel]
AutogradNestedTensor: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17484 [autograd kernel]
Tracer: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/TraceType_2.cpp:16726 [kernel]
AutocastCPU: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:487 [backend fallback]
AutocastCUDA: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:354 [backend fallback]
FuncTorchBatched: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:815 [backend fallback]
FuncTorchVmapMode: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/VmapModeRegistrations.cpp:28 [backend fallback]
Batched: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/LegacyBatchingRegistrations.cpp:1073 [backend fallback]
VmapMode: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
FuncTorchGradWrapper: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/TensorWrapper.cpp:210 [backend fallback]
PythonTLSSnapshot: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:152 [backend fallback]
FuncTorchDynamicLayerFrontMode: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/DynamicLayer.cpp:487 [backend fallback]
PythonDispatcher: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:148 [backend fallback]

Desktop Apple M2 pro MacOS13.6.9

I have pulled the latest version(updated on 9.6) , checked FAQ, exported PYTHONPATH before running. But it did not work. I tried to add map_location = self.devicein model.load_jit:

def load_jit(self, llm_text_encoder_model, llm_llm_model, flow_encoder_model):
        llm_text_encoder = torch.jit.load(llm_text_encoder_model, map_location=self.device)
        self.llm.text_encoder = llm_text_encoder
        llm_llm = torch.jit.load(llm_llm_model, map_location=self.device)
        self.llm.llm = llm_llm
        flow_encoder = torch.jit.load(flow_encoder_model, map_location=self.device)
        self.flow.encoder = flow_encoder

But I got:

2024-09-07 20:34:25,014 - modelscope - INFO - PyTorch version 2.0.1 Found.
2024-09-07 20:34:25,014 - modelscope - INFO - Loading ast index from /Users/jiangwenjing/.cache/modelscope/ast_indexer
2024-09-07 20:34:25,070 - modelscope - INFO - Loading done! Current index file version is 1.15.0, with md5 65db473890940e3550f281ba3a7b9944 and a total number of 980 components indexed
failed to import ttsfrd, use WeTextProcessing instead
/opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/diffusers/models/lora.py:393: FutureWarning: `LoRACompatibleLinear` is deprecated and will be removed in version 1.0.0. Use of `LoRACompatibleLinear` is deprecated. Please switch to PEFT backend by installing PEFT: `pip install peft`.
  deprecate("LoRACompatibleLinear", "1.0.0", deprecation_message)
2024-09-07 20:34:29,562 INFO input frame rate=50
2024-09-07 20:34:30,862 WETEXT INFO found existing fst: /opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/tn/zh_tn_tagger.fst
2024-09-07 20:34:30,862 INFO found existing fst: /opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/tn/zh_tn_tagger.fst
2024-09-07 20:34:30,863 WETEXT INFO                     /opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/tn/zh_tn_verbalizer.fst
2024-09-07 20:34:30,863 INFO                     /opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/tn/zh_tn_verbalizer.fst
2024-09-07 20:34:30,863 WETEXT INFO skip building fst for zh_normalizer ...
2024-09-07 20:34:30,863 INFO skip building fst for zh_normalizer ...
2024-09-07 20:34:31,097 WETEXT INFO found existing fst: /opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/tn/en_tn_tagger.fst
2024-09-07 20:34:31,097 INFO found existing fst: /opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/tn/en_tn_tagger.fst
2024-09-07 20:34:31,097 WETEXT INFO                     /opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/tn/en_tn_verbalizer.fst
2024-09-07 20:34:31,097 INFO                     /opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/tn/en_tn_verbalizer.fst
2024-09-07 20:34:31,097 WETEXT INFO skip building fst for en_normalizer ...
2024-09-07 20:34:31,097 INFO skip building fst for en_normalizer ...
['中文女', '中文男', '日语男', '粤语女', '英文女', '英文男', '韩语女']
  0%|                                                                                                                                               | 0/1 [00:00<?, ?it/s]2024-09-07 20:34:33,055 INFO synthesis text 你好,我是通义生成式语音大模型,请问有什么可以帮您的吗?
Exception in thread Thread-2:
Traceback (most recent call last):
  File "/opt/anaconda3/envs/cosyvoice/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/opt/anaconda3/envs/cosyvoice/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/jiangwenjing/cs-records/NUS/dl/cosyvoice/CosyVoice/cosyvoice/cli/model.py", line 82, in llm_job
    for i in self.llm.inference(text=text.to(self.device),
  File "/opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
    response = gen.send(None)
  File "/Users/jiangwenjing/cs-records/NUS/dl/cosyvoice/CosyVoice/cosyvoice/llm/llm.py", line 172, in inference
    text, text_len = self.encode(text, text_len)
  File "/Users/jiangwenjing/cs-records/NUS/dl/cosyvoice/CosyVoice/cosyvoice/llm/llm.py", line 75, in encode
    encoder_out, encoder_mask = self.text_encoder(text, text_lengths, decoding_chunk_size=1, num_decoding_left_chunks=-1)
  File "/opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/cosyvoice/transformer/encoder/___torch_mangle_5.py", line 22, in forward
    masks = torch.bitwise_not(torch.unsqueeze(mask, 1))
    embed = self.embed
    _0 = torch.add(torch.matmul(xs, CONSTANTS.c0), CONSTANTS.c1)
                   ~~~~~~~~~~~~ <--- HERE
    input = torch.layer_norm(_0, [1024], CONSTANTS.c2, CONSTANTS.c3)
    pos_enc = embed.pos_enc

Traceback of TorchScript, original code (most recent call last):
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'

  0%|                                                                                                                                               | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "test.py", line 9, in <module>
    for i, j in enumerate(cosyvoice.inference_sft('你好,我是通义生成式语音大模型,请问有什么可以帮您的吗?', '中文女', stream=False)):
  File "/Users/jiangwenjing/cs-records/NUS/dl/cosyvoice/CosyVoice/cosyvoice/cli/cosyvoice.py", line 61, in inference_sft
    for model_output in self.model.inference(**model_input, stream=stream):
  File "/Users/jiangwenjing/cs-records/NUS/dl/cosyvoice/CosyVoice/cosyvoice/cli/model.py", line 167, in inference
    this_tts_speech_token = torch.concat(self.tts_speech_token_dict[this_uuid], dim=1)
RuntimeError: torch.cat(): expected a non-empty list of Tensors
danpeen commented 1 month ago

encountered the same issue.

xipingL commented 1 month ago

same

silvercherry commented 1 month ago

same

Nebuluss commented 1 month ago

I'm having a very similar error when i tried to use it in comfy with https://github.com/melMass/CosyVoice-ComfyUI/ 👍

nput frame rate=50 synthesis text Hi, I'm Tania, a generative speech model, how can I help you? Exception in thread Thread-29 (llm_job): Traceback (most recent call last): File "threading.py", line 1045, in _bootstrap_inner File "threading.py", line 982, in run File "D:\ANEWCOMFY\ComfyUI_windows_portable\ComfyUI\custom_nodes\CosyVoice-ComfyUI\cosyvoice\cli\model.py", line 79, in llm_job for i in self.llm.inference( File "D:\ANEWCOMFY\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\utils_contextlib.py", line 36, in generator_context response = gen.send(None) ^^^^^^^^^^^^^^ File "D:\ANEWCOMFY\ComfyUI_windows_portable\ComfyUI\custom_nodes\CosyVoice-ComfyUI\cosyvoice\llm\llm.py", line 218, in inference text, text_len = self.encode(text, text_len) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ANEWCOMFY\ComfyUI_windows_portable\ComfyUI\custom_nodes\CosyVoice-ComfyUI\cosyvoice\llm\llm.py", line 77, in encode encoder_out, encoder_mask = self.text_encoder( ^^^^^^^^^^^^^^^^^^ File "D:\ANEWCOMFY\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ANEWCOMFY\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl return forward_call(args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last): File "code/torch/cosyvoice/transformer/encoder/_torch_mangle_5.py", line 95, in forward chunk_masks0 = torch.unsqueeze(chunk_masks, 0) chunk_masks1 = torch.and__(masks, chunk_masks0) x0 = torch.layer_norm(x, [1024], CONSTANTS.c5, CONSTANTS.c6)


    n_batch = torch.size(x0, 0)
    _37 = torch.add(torch.matmul(x0, CONSTANTS.c7), CONSTANTS.c8)

Traceback of TorchScript, original code (most recent call last):
  File "/mnt/lyuxiang.lx/anaconda3/envs/cosyvoice_refactor/lib/python3.8/site-packages/torch/nn/functional.py", line 2546, in forward
            layer_norm, (input, weight, bias), input, normalized_shape, weight=weight, bias=bias, eps=eps
        )
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
           ~~~~~~~~~~~~~~~~ <--- HERE
RuntimeError: expected scalar type Float but found Half

!!! Exception during processing !!! torch.cat(): expected a non-empty list of Tensors
Traceback (most recent call last):
  File "D:\ANEWCOMFY\ComfyUI_windows_portable\ComfyUI\execution.py", line 323, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ANEWCOMFY\ComfyUI_windows_portable\ComfyUI\execution.py", line 198, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ANEWCOMFY\ComfyUI_windows_portable\ComfyUI\execution.py", line 169, in _map_node_over_list
    process_inputs(input_dict, i)
  File "D:\ANEWCOMFY\ComfyUI_windows_portable\ComfyUI\execution.py", line 158, in process_inputs
    results.append(getattr(obj, func)(**inputs))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ANEWCOMFY\ComfyUI_windows_portable\ComfyUI\custom_nodes\CosyVoice-ComfyUI\nodes.py", line 234, in process
    return (self.to_comfy(output, speed),)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ANEWCOMFY\ComfyUI_windows_portable\ComfyUI\custom_nodes\CosyVoice-ComfyUI\nodes.py", line 181, in to_comfy
    for inf in inference_output:
  File "D:\ANEWCOMFY\ComfyUI_windows_portable\ComfyUI\custom_nodes\CosyVoice-ComfyUI\cosyvoice\cli\cosyvoice.py", line 62, in inference_sft
    for model_output in self.model.inference(**model_input, stream=stream):
  File "D:\ANEWCOMFY\ComfyUI_windows_portable\ComfyUI\custom_nodes\CosyVoice-ComfyUI\cosyvoice\cli\model.py", line 234, in inference
    this_tts_speech_token = torch.concat(
                            ^^^^^^^^^^^^^
RuntimeError: torch.cat(): expected a non-empty list of Tensors

Prompt executed in 15.38 seconds
wu0792 commented 1 month ago
image

这三处都要加上传参 map_location=self.device

xianqiliu commented 1 month ago

same

alkaidzone commented 1 month ago

image 这三处都要加上传参 map_location=self.device

谢谢解答,但是对我似乎无效,我在问题描述里写到了,我尝试过这个方法但是只是换了种报错

2024-09-07 20:34:25,014 - modelscope - INFO - PyTorch version 2.0.1 Found.
2024-09-07 20:34:25,014 - modelscope - INFO - Loading ast index from /Users/jiangwenjing/.cache/modelscope/ast_indexer
2024-09-07 20:34:25,070 - modelscope - INFO - Loading done! Current index file version is 1.15.0, with md5 65db473890940e3550f281ba3a7b9944 and a total number of 980 components indexed
failed to import ttsfrd, use WeTextProcessing instead
/opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/diffusers/models/lora.py:393: FutureWarning: `LoRACompatibleLinear` is deprecated and will be removed in version 1.0.0. Use of `LoRACompatibleLinear` is deprecated. Please switch to PEFT backend by installing PEFT: `pip install peft`.
  deprecate("LoRACompatibleLinear", "1.0.0", deprecation_message)
2024-09-07 20:34:29,562 INFO input frame rate=50
2024-09-07 20:34:30,862 WETEXT INFO found existing fst: /opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/tn/zh_tn_tagger.fst
2024-09-07 20:34:30,862 INFO found existing fst: /opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/tn/zh_tn_tagger.fst
2024-09-07 20:34:30,863 WETEXT INFO                     /opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/tn/zh_tn_verbalizer.fst
2024-09-07 20:34:30,863 INFO                     /opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/tn/zh_tn_verbalizer.fst
2024-09-07 20:34:30,863 WETEXT INFO skip building fst for zh_normalizer ...
2024-09-07 20:34:30,863 INFO skip building fst for zh_normalizer ...
2024-09-07 20:34:31,097 WETEXT INFO found existing fst: /opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/tn/en_tn_tagger.fst
2024-09-07 20:34:31,097 INFO found existing fst: /opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/tn/en_tn_tagger.fst
2024-09-07 20:34:31,097 WETEXT INFO                     /opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/tn/en_tn_verbalizer.fst
2024-09-07 20:34:31,097 INFO                     /opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/tn/en_tn_verbalizer.fst
2024-09-07 20:34:31,097 WETEXT INFO skip building fst for en_normalizer ...
2024-09-07 20:34:31,097 INFO skip building fst for en_normalizer ...
['中文女', '中文男', '日语男', '粤语女', '英文女', '英文男', '韩语女']
  0%|                                                                                                                                               | 0/1 [00:00<?, ?it/s]2024-09-07 20:34:33,055 INFO synthesis text 你好,我是通义生成式语音大模型,请问有什么可以帮您的吗?
Exception in thread Thread-2:
Traceback (most recent call last):
  File "/opt/anaconda3/envs/cosyvoice/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/opt/anaconda3/envs/cosyvoice/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/jiangwenjing/cs-records/NUS/dl/cosyvoice/CosyVoice/cosyvoice/cli/model.py", line 82, in llm_job
    for i in self.llm.inference(text=text.to(self.device),
  File "/opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
    response = gen.send(None)
  File "/Users/jiangwenjing/cs-records/NUS/dl/cosyvoice/CosyVoice/cosyvoice/llm/llm.py", line 172, in inference
    text, text_len = self.encode(text, text_len)
  File "/Users/jiangwenjing/cs-records/NUS/dl/cosyvoice/CosyVoice/cosyvoice/llm/llm.py", line 75, in encode
    encoder_out, encoder_mask = self.text_encoder(text, text_lengths, decoding_chunk_size=1, num_decoding_left_chunks=-1)
  File "/opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/cosyvoice/transformer/encoder/___torch_mangle_5.py", line 22, in forward
    masks = torch.bitwise_not(torch.unsqueeze(mask, 1))
    embed = self.embed
    _0 = torch.add(torch.matmul(xs, CONSTANTS.c0), CONSTANTS.c1)
                   ~~~~~~~~~~~~ <--- HERE
    input = torch.layer_norm(_0, [1024], CONSTANTS.c2, CONSTANTS.c3)
    pos_enc = embed.pos_enc

Traceback of TorchScript, original code (most recent call last):
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'

  0%|                                                                                                                                               | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "test.py", line 9, in <module>
    for i, j in enumerate(cosyvoice.inference_sft('你好,我是通义生成式语音大模型,请问有什么可以帮您的吗?', '中文女', stream=False)):
  File "/Users/jiangwenjing/cs-records/NUS/dl/cosyvoice/CosyVoice/cosyvoice/cli/cosyvoice.py", line 61, in inference_sft
    for model_output in self.model.inference(**model_input, stream=stream):
  File "/Users/jiangwenjing/cs-records/NUS/dl/cosyvoice/CosyVoice/cosyvoice/cli/model.py", line 167, in inference
    this_tts_speech_token = torch.concat(self.tts_speech_token_dict[this_uuid], dim=1)
RuntimeError: torch.cat(): expected a non-empty list of Tensors
wu0792 commented 1 month ago

GPT的解答参考,但我觉得你的环境可能有问题,比如pytorch这些,最好严格安装readme里面说的来,比如使用独立conda环境安装这些,我改了上面提到的map_location,以及删除 ~/.cache/modelscope/ast_indexer 之外,就能在MAC上顺利运行

zhangdxchn commented 1 month ago

same issue, see this, work for me.

qy8502 commented 1 month ago

same issue. /mnt/lyuxiang.lx/anaconda3/envs/cosyvoice_refactor/lib/python3.8/site-packages/torch/nn/functional.py ??

qy8502 commented 1 month ago

I’ve fixed this issue, but the model needs to be modified. Taking CosyVoice-300M-SFT as an example:

cd /opt/ComfyUI/custom_nodes/CosyVoice-ComfyUI/pretrained_models/CosyVoice-300M-SFT
unzip llm.text_encoder.fp16.zip
vi llm.text_encoder.fp16/code/__torch__/cosyvoice/transformer/encoder/___torch_mangle_5.py

Insert input = input.half() at line 24, because for an unknown reason (attempted to solve an environment issue, but was not successful) the input is not of the half type.

  def forward(self: __torch__.cosyvoice.transformer.encoder.___torch_mangle_5.ConformerEncoder,
    xs: Tensor,
    xs_lens: Tensor,
    decoding_chunk_size: int=0,
    num_decoding_left_chunks: int=-1) -> Tuple[Tensor, Tensor]:
    T = torch.size(xs, 1)
    batch_size = torch.size(xs_lens, 0)
    if torch.gt(T, 0):
      max_len = T
    else:
      max_len = torch.item(torch.max(xs_lens))
    seq_range = torch.arange(0, max_len, dtype=4, layout=None, device=ops.prim.device(xs_lens))
    seq_range_expand = torch.expand(torch.unsqueeze(seq_range, 0), [batch_size, int(max_len)])
    seq_length_expand = torch.unsqueeze(xs_lens, -1)
    mask = torch.ge(seq_range_expand, seq_length_expand)
    masks = torch.bitwise_not(torch.unsqueeze(mask, 1))
    embed = self.embed
    _0 = torch.add(torch.matmul(xs, CONSTANTS.c0), CONSTANTS.c1)
    input = torch.layer_norm(_0, [1024], CONSTANTS.c2, CONSTANTS.c3)

    # +++ here, fixed +++
    input = input.half()

    pos_enc = embed.pos_enc
    pe = pos_enc.pe
    _1 = torch.size(pe, 1)
    _2 = torch.size(input, 1)
    _3 = torch.ge(_1, torch.sub(torch.mul(_2, 2), 1))
zip -r llm.text_encoder.fp16.zip llm.text_encoder.fp16