AIFSH / CosyVoice-ComfyUI

a comfyui custom node for CosyVoice
Apache License 2.0
174 stars 22 forks source link

torch.cat(): expected a non-empty list of Tensors #41

Open kong6001 opened 2 months ago

kong6001 commented 2 months ago

torch.cat(): expected a non-empty list of Tensors

File "D:\ai\ComfyUI-aki-v1.3-chumen0731\execution.py", line 317, in execute output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) File "D:\ai\ComfyUI-aki-v1.3-chumen0731\execution.py", line 192, in get_output_data return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) File "D:\ai\ComfyUI-aki-v1.3-chumen0731\execution.py", line 169, in _map_node_over_list process_inputs(input_dict, i) File "D:\ai\ComfyUI-aki-v1.3-chumen0731\execution.py", line 158, in process_inputs results.append(getattr(obj, func)(inputs)) File "D:\ai\ComfyUI-aki-v1.3-chumen0731\custom_nodes\CosyVoice-ComfyUI\nodes.py", line 170, in generate for out_dict in output: File "D:\ai\ComfyUI-aki-v1.3-chumen0731\custom_nodes\CosyVoice-ComfyUI\cosyvoice\cli\cosyvoice.py", line 95, in inference_instruct for model_output in self.model.inference(model_input, stream=stream): File "D:\ai\ComfyUI-aki-v1.3-chumen0731\custom_nodes\CosyVoice-ComfyUI\cosyvoice\cli\model.py", line 158, in inference this_tts_speech_token = torch.concat(self.tts_speech_token_dict[this_uuid], dim=1)

Close

lovefy-eth commented 2 months ago

+1

skimy2023 commented 2 months ago

+1

skimy2023 commented 2 months ago

Have you solved the problem?

kong6001 commented 2 months ago

Have you solved the problem?

No.

kanfengjingderen commented 2 months ago

same problem

nancygd commented 2 months ago

maybe it is env problem, i also meet the problem, but after i create the new env, the problem do not have.

piaolingxue commented 2 months ago

RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last): File "code/torch/cosyvoice/transformer/encoder/_torch_mangle_5.py", line 95, in forward chunk_masks0 = torch.unsqueeze(chunk_masks, 0) chunk_masks1 = torch.and__(masks, chunk_masks0) x0 = torch.layer_norm(x, [1024], CONSTANTS.c5, CONSTANTS.c6)


    n_batch = torch.size(x0, 0)
    _37 = torch.add(torch.matmul(x0, CONSTANTS.c7), CONSTANTS.c8)

Traceback of TorchScript, original code (most recent call last):
  File "/mnt/lyuxiang.lx/anaconda3/envs/cosyvoice_refactor/lib/python3.8/site-packages/torch/nn/functional.py", line 2546, in forward
            layer_norm, (input, weight, bias), input, normalized_shape, weight=weight, bias=bias, eps=eps
        )
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
           ~~~~~~~~~~~~~~~~ <--- HERE
RuntimeError: expected scalar type Float but found Half

!!! Exception during processing !!! torch.cat(): expected a non-empty list of Tensors
Traceback (most recent call last):
  File "/ComfyUI/execution.py", line 323, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ComfyUI/execution.py", line 198, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ComfyUI/execution.py", line 169, in _map_node_over_list
    process_inputs(input_dict, i)
  File "/ComfyUI/execution.py", line 158, in process_inputs
    results.append(getattr(obj, func)(**inputs))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ComfyUI/custom_nodes/CosyVoice-ComfyUI/nodes.py", line 171, in generate
    for out_dict in output:
  File "/ComfyUI/custom_nodes/CosyVoice-ComfyUI/cosyvoice/cli/cosyvoice.py", line 56, in inference_sft
    for model_output in self.model.inference(**model_input, stream=stream):
  File "/ComfyUI/custom_nodes/CosyVoice-ComfyUI/cosyvoice/cli/model.py", line 158, in inference
    this_tts_speech_token = torch.concat(self.tts_speech_token_dict[this_uuid], dim=1)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: torch.cat(): expected a non-empty list of Tensors
piaolingxue commented 2 months ago

+1

izonewonyoung commented 2 months ago

+1

goldx3 commented 2 months ago

+1

qy8502 commented 1 month ago

+1

qy8502 commented 1 month ago

I’ve fixed this issue, but the model needs to be modified. Taking CosyVoice-300M-SFT as an example:

cd /opt/ComfyUI/custom_nodes/CosyVoice-ComfyUI/pretrained_models/CosyVoice-300M-SFT
unzip llm.text_encoder.fp16.zip
vi llm.text_encoder.fp16/code/__torch__/cosyvoice/transformer/encoder/___torch_mangle_5.py

Insert input = input.half() at line 24, because for an unknown reason (attempted to solve an environment issue, but was not successful) the input is not of the half type.

  def forward(self: __torch__.cosyvoice.transformer.encoder.___torch_mangle_5.ConformerEncoder,
    xs: Tensor,
    xs_lens: Tensor,
    decoding_chunk_size: int=0,
    num_decoding_left_chunks: int=-1) -> Tuple[Tensor, Tensor]:
    T = torch.size(xs, 1)
    batch_size = torch.size(xs_lens, 0)
    if torch.gt(T, 0):
      max_len = T
    else:
      max_len = torch.item(torch.max(xs_lens))
    seq_range = torch.arange(0, max_len, dtype=4, layout=None, device=ops.prim.device(xs_lens))
    seq_range_expand = torch.expand(torch.unsqueeze(seq_range, 0), [batch_size, int(max_len)])
    seq_length_expand = torch.unsqueeze(xs_lens, -1)
    mask = torch.ge(seq_range_expand, seq_length_expand)
    masks = torch.bitwise_not(torch.unsqueeze(mask, 1))
    embed = self.embed
    _0 = torch.add(torch.matmul(xs, CONSTANTS.c0), CONSTANTS.c1)
    input = torch.layer_norm(_0, [1024], CONSTANTS.c2, CONSTANTS.c3)

    # +++ here, fixed +++
    input = input.half()

    pos_enc = embed.pos_enc
    pe = pos_enc.pe
    _1 = torch.size(pe, 1)
    _2 = torch.size(input, 1)
    _3 = torch.ge(_1, torch.sub(torch.mul(_2, 2), 1))
zip -r llm.text_encoder.fp16.zip llm.text_encoder.fp16
flamingol1 commented 1 month ago

I’ve fixed this issue, but the model needs to be modified. Taking CosyVoice-300M-SFT as an example:

cd /opt/ComfyUI/custom_nodes/CosyVoice-ComfyUI/pretrained_models/CosyVoice-300M-SFT
unzip llm.text_encoder.fp16.zip
vi llm.text_encoder.fp16/code/__torch__/cosyvoice/transformer/encoder/___torch_mangle_5.py

Insert input = input.half() at line 24, because for an unknown reason (attempted to solve an environment issue, but was not successful) the input is not of the half type.

  def forward(self: __torch__.cosyvoice.transformer.encoder.___torch_mangle_5.ConformerEncoder,
    xs: Tensor,
    xs_lens: Tensor,
    decoding_chunk_size: int=0,
    num_decoding_left_chunks: int=-1) -> Tuple[Tensor, Tensor]:
    T = torch.size(xs, 1)
    batch_size = torch.size(xs_lens, 0)
    if torch.gt(T, 0):
      max_len = T
    else:
      max_len = torch.item(torch.max(xs_lens))
    seq_range = torch.arange(0, max_len, dtype=4, layout=None, device=ops.prim.device(xs_lens))
    seq_range_expand = torch.expand(torch.unsqueeze(seq_range, 0), [batch_size, int(max_len)])
    seq_length_expand = torch.unsqueeze(xs_lens, -1)
    mask = torch.ge(seq_range_expand, seq_length_expand)
    masks = torch.bitwise_not(torch.unsqueeze(mask, 1))
    embed = self.embed
    _0 = torch.add(torch.matmul(xs, CONSTANTS.c0), CONSTANTS.c1)
    input = torch.layer_norm(_0, [1024], CONSTANTS.c2, CONSTANTS.c3)

    # +++ here, fixed +++
    input = input.half()

    pos_enc = embed.pos_enc
    pe = pos_enc.pe
    _1 = torch.size(pe, 1)
    _2 = torch.size(input, 1)
    _3 = torch.ge(_1, torch.sub(torch.mul(_2, 2), 1))
zip -r llm.text_encoder.fp16.zip llm.text_encoder.fp16

thanks a lot,I've fixed it successfully