Open alkaidzone opened 2 months ago
encountered the same issue.
same
same
I'm having a very similar error when i tried to use it in comfy with https://github.com/melMass/CosyVoice-ComfyUI/ 👍
nput frame rate=50 synthesis text Hi, I'm Tania, a generative speech model, how can I help you? Exception in thread Thread-29 (llm_job): Traceback (most recent call last): File "threading.py", line 1045, in _bootstrap_inner File "threading.py", line 982, in run File "D:\ANEWCOMFY\ComfyUI_windows_portable\ComfyUI\custom_nodes\CosyVoice-ComfyUI\cosyvoice\cli\model.py", line 79, in llm_job for i in self.llm.inference( File "D:\ANEWCOMFY\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\utils_contextlib.py", line 36, in generator_context response = gen.send(None) ^^^^^^^^^^^^^^ File "D:\ANEWCOMFY\ComfyUI_windows_portable\ComfyUI\custom_nodes\CosyVoice-ComfyUI\cosyvoice\llm\llm.py", line 218, in inference text, text_len = self.encode(text, text_len) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ANEWCOMFY\ComfyUI_windows_portable\ComfyUI\custom_nodes\CosyVoice-ComfyUI\cosyvoice\llm\llm.py", line 77, in encode encoder_out, encoder_mask = self.text_encoder( ^^^^^^^^^^^^^^^^^^ File "D:\ANEWCOMFY\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ANEWCOMFY\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl return forward_call(args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last): File "code/torch/cosyvoice/transformer/encoder/_torch_mangle_5.py", line 95, in forward chunk_masks0 = torch.unsqueeze(chunk_masks, 0) chunk_masks1 = torch.and__(masks, chunk_masks0) x0 = torch.layer_norm(x, [1024], CONSTANTS.c5, CONSTANTS.c6)
n_batch = torch.size(x0, 0)
_37 = torch.add(torch.matmul(x0, CONSTANTS.c7), CONSTANTS.c8)
Traceback of TorchScript, original code (most recent call last):
File "/mnt/lyuxiang.lx/anaconda3/envs/cosyvoice_refactor/lib/python3.8/site-packages/torch/nn/functional.py", line 2546, in forward
layer_norm, (input, weight, bias), input, normalized_shape, weight=weight, bias=bias, eps=eps
)
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
~~~~~~~~~~~~~~~~ <--- HERE
RuntimeError: expected scalar type Float but found Half
!!! Exception during processing !!! torch.cat(): expected a non-empty list of Tensors
Traceback (most recent call last):
File "D:\ANEWCOMFY\ComfyUI_windows_portable\ComfyUI\execution.py", line 323, in execute
output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\ANEWCOMFY\ComfyUI_windows_portable\ComfyUI\execution.py", line 198, in get_output_data
return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\ANEWCOMFY\ComfyUI_windows_portable\ComfyUI\execution.py", line 169, in _map_node_over_list
process_inputs(input_dict, i)
File "D:\ANEWCOMFY\ComfyUI_windows_portable\ComfyUI\execution.py", line 158, in process_inputs
results.append(getattr(obj, func)(**inputs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\ANEWCOMFY\ComfyUI_windows_portable\ComfyUI\custom_nodes\CosyVoice-ComfyUI\nodes.py", line 234, in process
return (self.to_comfy(output, speed),)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\ANEWCOMFY\ComfyUI_windows_portable\ComfyUI\custom_nodes\CosyVoice-ComfyUI\nodes.py", line 181, in to_comfy
for inf in inference_output:
File "D:\ANEWCOMFY\ComfyUI_windows_portable\ComfyUI\custom_nodes\CosyVoice-ComfyUI\cosyvoice\cli\cosyvoice.py", line 62, in inference_sft
for model_output in self.model.inference(**model_input, stream=stream):
File "D:\ANEWCOMFY\ComfyUI_windows_portable\ComfyUI\custom_nodes\CosyVoice-ComfyUI\cosyvoice\cli\model.py", line 234, in inference
this_tts_speech_token = torch.concat(
^^^^^^^^^^^^^
RuntimeError: torch.cat(): expected a non-empty list of Tensors
Prompt executed in 15.38 seconds
这三处都要加上传参 map_location=self.device
same
这三处都要加上传参 map_location=self.device
谢谢解答,但是对我似乎无效,我在问题描述里写到了,我尝试过这个方法但是只是换了种报错
2024-09-07 20:34:25,014 - modelscope - INFO - PyTorch version 2.0.1 Found.
2024-09-07 20:34:25,014 - modelscope - INFO - Loading ast index from /Users/jiangwenjing/.cache/modelscope/ast_indexer
2024-09-07 20:34:25,070 - modelscope - INFO - Loading done! Current index file version is 1.15.0, with md5 65db473890940e3550f281ba3a7b9944 and a total number of 980 components indexed
failed to import ttsfrd, use WeTextProcessing instead
/opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/diffusers/models/lora.py:393: FutureWarning: `LoRACompatibleLinear` is deprecated and will be removed in version 1.0.0. Use of `LoRACompatibleLinear` is deprecated. Please switch to PEFT backend by installing PEFT: `pip install peft`.
deprecate("LoRACompatibleLinear", "1.0.0", deprecation_message)
2024-09-07 20:34:29,562 INFO input frame rate=50
2024-09-07 20:34:30,862 WETEXT INFO found existing fst: /opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/tn/zh_tn_tagger.fst
2024-09-07 20:34:30,862 INFO found existing fst: /opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/tn/zh_tn_tagger.fst
2024-09-07 20:34:30,863 WETEXT INFO /opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/tn/zh_tn_verbalizer.fst
2024-09-07 20:34:30,863 INFO /opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/tn/zh_tn_verbalizer.fst
2024-09-07 20:34:30,863 WETEXT INFO skip building fst for zh_normalizer ...
2024-09-07 20:34:30,863 INFO skip building fst for zh_normalizer ...
2024-09-07 20:34:31,097 WETEXT INFO found existing fst: /opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/tn/en_tn_tagger.fst
2024-09-07 20:34:31,097 INFO found existing fst: /opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/tn/en_tn_tagger.fst
2024-09-07 20:34:31,097 WETEXT INFO /opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/tn/en_tn_verbalizer.fst
2024-09-07 20:34:31,097 INFO /opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/tn/en_tn_verbalizer.fst
2024-09-07 20:34:31,097 WETEXT INFO skip building fst for en_normalizer ...
2024-09-07 20:34:31,097 INFO skip building fst for en_normalizer ...
['中文女', '中文男', '日语男', '粤语女', '英文女', '英文男', '韩语女']
0%| | 0/1 [00:00<?, ?it/s]2024-09-07 20:34:33,055 INFO synthesis text 你好,我是通义生成式语音大模型,请问有什么可以帮您的吗?
Exception in thread Thread-2:
Traceback (most recent call last):
File "/opt/anaconda3/envs/cosyvoice/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/opt/anaconda3/envs/cosyvoice/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/Users/jiangwenjing/cs-records/NUS/dl/cosyvoice/CosyVoice/cosyvoice/cli/model.py", line 82, in llm_job
for i in self.llm.inference(text=text.to(self.device),
File "/opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "/Users/jiangwenjing/cs-records/NUS/dl/cosyvoice/CosyVoice/cosyvoice/llm/llm.py", line 172, in inference
text, text_len = self.encode(text, text_len)
File "/Users/jiangwenjing/cs-records/NUS/dl/cosyvoice/CosyVoice/cosyvoice/llm/llm.py", line 75, in encode
encoder_out, encoder_mask = self.text_encoder(text, text_lengths, decoding_chunk_size=1, num_decoding_left_chunks=-1)
File "/opt/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/__torch__/cosyvoice/transformer/encoder/___torch_mangle_5.py", line 22, in forward
masks = torch.bitwise_not(torch.unsqueeze(mask, 1))
embed = self.embed
_0 = torch.add(torch.matmul(xs, CONSTANTS.c0), CONSTANTS.c1)
~~~~~~~~~~~~ <--- HERE
input = torch.layer_norm(_0, [1024], CONSTANTS.c2, CONSTANTS.c3)
pos_enc = embed.pos_enc
Traceback of TorchScript, original code (most recent call last):
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'
0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "test.py", line 9, in <module>
for i, j in enumerate(cosyvoice.inference_sft('你好,我是通义生成式语音大模型,请问有什么可以帮您的吗?', '中文女', stream=False)):
File "/Users/jiangwenjing/cs-records/NUS/dl/cosyvoice/CosyVoice/cosyvoice/cli/cosyvoice.py", line 61, in inference_sft
for model_output in self.model.inference(**model_input, stream=stream):
File "/Users/jiangwenjing/cs-records/NUS/dl/cosyvoice/CosyVoice/cosyvoice/cli/model.py", line 167, in inference
this_tts_speech_token = torch.concat(self.tts_speech_token_dict[this_uuid], dim=1)
RuntimeError: torch.cat(): expected a non-empty list of Tensors
same issue, see this, work for me.
same issue. /mnt/lyuxiang.lx/anaconda3/envs/cosyvoice_refactor/lib/python3.8/site-packages/torch/nn/functional.py ??
I’ve fixed this issue, but the model needs to be modified. Taking CosyVoice-300M-SFT as an example:
cd /opt/ComfyUI/custom_nodes/CosyVoice-ComfyUI/pretrained_models/CosyVoice-300M-SFT
unzip llm.text_encoder.fp16.zip
vi llm.text_encoder.fp16/code/__torch__/cosyvoice/transformer/encoder/___torch_mangle_5.py
Insert input = input.half() at line 24, because for an unknown reason (attempted to solve an environment issue, but was not successful) the input is not of the half type.
def forward(self: __torch__.cosyvoice.transformer.encoder.___torch_mangle_5.ConformerEncoder,
xs: Tensor,
xs_lens: Tensor,
decoding_chunk_size: int=0,
num_decoding_left_chunks: int=-1) -> Tuple[Tensor, Tensor]:
T = torch.size(xs, 1)
batch_size = torch.size(xs_lens, 0)
if torch.gt(T, 0):
max_len = T
else:
max_len = torch.item(torch.max(xs_lens))
seq_range = torch.arange(0, max_len, dtype=4, layout=None, device=ops.prim.device(xs_lens))
seq_range_expand = torch.expand(torch.unsqueeze(seq_range, 0), [batch_size, int(max_len)])
seq_length_expand = torch.unsqueeze(xs_lens, -1)
mask = torch.ge(seq_range_expand, seq_length_expand)
masks = torch.bitwise_not(torch.unsqueeze(mask, 1))
embed = self.embed
_0 = torch.add(torch.matmul(xs, CONSTANTS.c0), CONSTANTS.c1)
input = torch.layer_norm(_0, [1024], CONSTANTS.c2, CONSTANTS.c3)
# +++ here, fixed +++
input = input.half()
pos_enc = embed.pos_enc
pe = pos_enc.pe
_1 = torch.size(pe, 1)
_2 = torch.size(input, 1)
_3 = torch.ge(_1, torch.sub(torch.mul(_2, 2), 1))
zip -r llm.text_encoder.fp16.zip llm.text_encoder.fp16
when running examples in Basic Usage, it shows errors:
Desktop Apple M2 pro MacOS13.6.9
I have pulled the latest version(updated on 9.6) , checked FAQ, exported PYTHONPATH before running. But it did not work. I tried to add
map_location = self.device
in model.load_jit:But I got: