Open kong6001 opened 2 months ago
+1
+1
Have you solved the problem?
Have you solved the problem?
No.
same problem
maybe it is env problem, i also meet the problem, but after i create the new env, the problem do not have.
RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last): File "code/torch/cosyvoice/transformer/encoder/_torch_mangle_5.py", line 95, in forward chunk_masks0 = torch.unsqueeze(chunk_masks, 0) chunk_masks1 = torch.and__(masks, chunk_masks0) x0 = torch.layer_norm(x, [1024], CONSTANTS.c5, CONSTANTS.c6)
n_batch = torch.size(x0, 0)
_37 = torch.add(torch.matmul(x0, CONSTANTS.c7), CONSTANTS.c8)
Traceback of TorchScript, original code (most recent call last):
File "/mnt/lyuxiang.lx/anaconda3/envs/cosyvoice_refactor/lib/python3.8/site-packages/torch/nn/functional.py", line 2546, in forward
layer_norm, (input, weight, bias), input, normalized_shape, weight=weight, bias=bias, eps=eps
)
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
~~~~~~~~~~~~~~~~ <--- HERE
RuntimeError: expected scalar type Float but found Half
!!! Exception during processing !!! torch.cat(): expected a non-empty list of Tensors
Traceback (most recent call last):
File "/ComfyUI/execution.py", line 323, in execute
output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/execution.py", line 198, in get_output_data
return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/execution.py", line 169, in _map_node_over_list
process_inputs(input_dict, i)
File "/ComfyUI/execution.py", line 158, in process_inputs
results.append(getattr(obj, func)(**inputs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/custom_nodes/CosyVoice-ComfyUI/nodes.py", line 171, in generate
for out_dict in output:
File "/ComfyUI/custom_nodes/CosyVoice-ComfyUI/cosyvoice/cli/cosyvoice.py", line 56, in inference_sft
for model_output in self.model.inference(**model_input, stream=stream):
File "/ComfyUI/custom_nodes/CosyVoice-ComfyUI/cosyvoice/cli/model.py", line 158, in inference
this_tts_speech_token = torch.concat(self.tts_speech_token_dict[this_uuid], dim=1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: torch.cat(): expected a non-empty list of Tensors
+1
+1
+1
+1
I’ve fixed this issue, but the model needs to be modified. Taking CosyVoice-300M-SFT as an example:
cd /opt/ComfyUI/custom_nodes/CosyVoice-ComfyUI/pretrained_models/CosyVoice-300M-SFT
unzip llm.text_encoder.fp16.zip
vi llm.text_encoder.fp16/code/__torch__/cosyvoice/transformer/encoder/___torch_mangle_5.py
Insert input = input.half() at line 24, because for an unknown reason (attempted to solve an environment issue, but was not successful) the input is not of the half type.
def forward(self: __torch__.cosyvoice.transformer.encoder.___torch_mangle_5.ConformerEncoder,
xs: Tensor,
xs_lens: Tensor,
decoding_chunk_size: int=0,
num_decoding_left_chunks: int=-1) -> Tuple[Tensor, Tensor]:
T = torch.size(xs, 1)
batch_size = torch.size(xs_lens, 0)
if torch.gt(T, 0):
max_len = T
else:
max_len = torch.item(torch.max(xs_lens))
seq_range = torch.arange(0, max_len, dtype=4, layout=None, device=ops.prim.device(xs_lens))
seq_range_expand = torch.expand(torch.unsqueeze(seq_range, 0), [batch_size, int(max_len)])
seq_length_expand = torch.unsqueeze(xs_lens, -1)
mask = torch.ge(seq_range_expand, seq_length_expand)
masks = torch.bitwise_not(torch.unsqueeze(mask, 1))
embed = self.embed
_0 = torch.add(torch.matmul(xs, CONSTANTS.c0), CONSTANTS.c1)
input = torch.layer_norm(_0, [1024], CONSTANTS.c2, CONSTANTS.c3)
# +++ here, fixed +++
input = input.half()
pos_enc = embed.pos_enc
pe = pos_enc.pe
_1 = torch.size(pe, 1)
_2 = torch.size(input, 1)
_3 = torch.ge(_1, torch.sub(torch.mul(_2, 2), 1))
zip -r llm.text_encoder.fp16.zip llm.text_encoder.fp16
I’ve fixed this issue, but the model needs to be modified. Taking CosyVoice-300M-SFT as an example:
cd /opt/ComfyUI/custom_nodes/CosyVoice-ComfyUI/pretrained_models/CosyVoice-300M-SFT unzip llm.text_encoder.fp16.zip vi llm.text_encoder.fp16/code/__torch__/cosyvoice/transformer/encoder/___torch_mangle_5.py
Insert input = input.half() at line 24, because for an unknown reason (attempted to solve an environment issue, but was not successful) the input is not of the half type.
def forward(self: __torch__.cosyvoice.transformer.encoder.___torch_mangle_5.ConformerEncoder, xs: Tensor, xs_lens: Tensor, decoding_chunk_size: int=0, num_decoding_left_chunks: int=-1) -> Tuple[Tensor, Tensor]: T = torch.size(xs, 1) batch_size = torch.size(xs_lens, 0) if torch.gt(T, 0): max_len = T else: max_len = torch.item(torch.max(xs_lens)) seq_range = torch.arange(0, max_len, dtype=4, layout=None, device=ops.prim.device(xs_lens)) seq_range_expand = torch.expand(torch.unsqueeze(seq_range, 0), [batch_size, int(max_len)]) seq_length_expand = torch.unsqueeze(xs_lens, -1) mask = torch.ge(seq_range_expand, seq_length_expand) masks = torch.bitwise_not(torch.unsqueeze(mask, 1)) embed = self.embed _0 = torch.add(torch.matmul(xs, CONSTANTS.c0), CONSTANTS.c1) input = torch.layer_norm(_0, [1024], CONSTANTS.c2, CONSTANTS.c3) # +++ here, fixed +++ input = input.half() pos_enc = embed.pos_enc pe = pos_enc.pe _1 = torch.size(pe, 1) _2 = torch.size(input, 1) _3 = torch.ge(_1, torch.sub(torch.mul(_2, 2), 1))
zip -r llm.text_encoder.fp16.zip llm.text_encoder.fp16
thanks a lot,I've fixed it successfully
torch.cat(): expected a non-empty list of Tensors
File "D:\ai\ComfyUI-aki-v1.3-chumen0731\execution.py", line 317, in execute output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) File "D:\ai\ComfyUI-aki-v1.3-chumen0731\execution.py", line 192, in get_output_data return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) File "D:\ai\ComfyUI-aki-v1.3-chumen0731\execution.py", line 169, in _map_node_over_list process_inputs(input_dict, i) File "D:\ai\ComfyUI-aki-v1.3-chumen0731\execution.py", line 158, in process_inputs results.append(getattr(obj, func)(inputs)) File "D:\ai\ComfyUI-aki-v1.3-chumen0731\custom_nodes\CosyVoice-ComfyUI\nodes.py", line 170, in generate for out_dict in output: File "D:\ai\ComfyUI-aki-v1.3-chumen0731\custom_nodes\CosyVoice-ComfyUI\cosyvoice\cli\cosyvoice.py", line 95, in inference_instruct for model_output in self.model.inference(model_input, stream=stream): File "D:\ai\ComfyUI-aki-v1.3-chumen0731\custom_nodes\CosyVoice-ComfyUI\cosyvoice\cli\model.py", line 158, in inference this_tts_speech_token = torch.concat(self.tts_speech_token_dict[this_uuid], dim=1)
Close