gokayfem / ComfyUI_VLM_nodes

Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
Apache License 2.0
384 stars 31 forks source link

Moondream RuntimeError: expanded size of tensor (749) must match the existing size (750) at non-singleton dimension 1. #28

Closed sjuxax closed 7 months ago

sjuxax commented 7 months ago

Tried the Moondream node and it says this:

ERROR:root:!!! Exception during processing !!!
ERROR:root:Traceback (most recent call last):
  File "/net/dj/code/clones/github.com/comfyanonymous/ComfyUI/execution.py", line 152, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/net/dj/code/clones/github.com/comfyanonymous/ComfyUI/execution.py", line 82, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/net/dj/code/clones/github.com/comfyanonymous/ComfyUI/custom_nodes/ComfyUI-0246/utils.py", line 381, in new_func
    res_value = old_func(*final_args, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/net/dj/code/clones/github.com/comfyanonymous/ComfyUI/execution.py", line 75, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/net/dj/code/clones/github.com/comfyanonymous/ComfyUI/custom_nodes/ComfyUI_VLM_nodes/nodes/moondream_script.py", line 76, in answer_questions
    full_sentence = self.text_model.answer_question(image_embeds, question)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/net/dj/code/clones/github.com/comfyanonymous/ComfyUI/custom_nodes/ComfyUI_VLM_nodes/nodes/moondream/text_model.py", line 79, in answer_question
    answer = self.generate(
             ^^^^^^^^^^^^^^
  File "/net/dj/code/clones/github.com/comfyanonymous/ComfyUI/custom_nodes/ComfyUI_VLM_nodes/nodes/moondream/text_model.py", line 71, in generate
    output_ids = self.model.generate(
                 ^^^^^^^^^^^^^^^^^^^^
  File "/home/jeff/.virtualenvs/comfyui/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/jeff/.virtualenvs/comfyui/lib/python3.11/site-packages/transformers/generation/utils.py", line 1544, in generate
    return self.greedy_search(
           ^^^^^^^^^^^^^^^^^^^
  File "/home/jeff/.virtualenvs/comfyui/lib/python3.11/site-packages/transformers/generation/utils.py", line 2404, in greedy_search
    outputs = self(
              ^^^^^
  File "/home/jeff/.virtualenvs/comfyui/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jeff/.virtualenvs/comfyui/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/net/dj/code/clones/github.com/comfyanonymous/ComfyUI/custom_nodes/ComfyUI_VLM_nodes/nodes/moondream/phi/modeling_phi.py", line 992, in forward
    hidden_states = self.transformer(
                    ^^^^^^^^^^^^^^^^^
  File "/home/jeff/.virtualenvs/comfyui/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jeff/.virtualenvs/comfyui/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/net/dj/code/clones/github.com/comfyanonymous/ComfyUI/custom_nodes/ComfyUI_VLM_nodes/nodes/moondream/phi/modeling_phi.py", line 933, in forward
    hidden_states = layer(
                    ^^^^^^
  File "/home/jeff/.virtualenvs/comfyui/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jeff/.virtualenvs/comfyui/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/net/dj/code/clones/github.com/comfyanonymous/ComfyUI/custom_nodes/ComfyUI_VLM_nodes/nodes/moondream/phi/modeling_phi.py", line 734, in forward
    attn_outputs = self.mixer(
                   ^^^^^^^^^^^
  File "/home/jeff/.virtualenvs/comfyui/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jeff/.virtualenvs/comfyui/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/net/dj/code/clones/github.com/comfyanonymous/ComfyUI/custom_nodes/ComfyUI_VLM_nodes/nodes/moondream/phi/modeling_phi.py", line 688, in forward
    attn_output = self._forward_cross_attn(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/net/dj/code/clones/github.com/comfyanonymous/ComfyUI/custom_nodes/ComfyUI_VLM_nodes/nodes/moondream/phi/modeling_phi.py", line 664, in _forward_cross_attn
    return self.inner_cross_attn(
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jeff/.virtualenvs/comfyui/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jeff/.virtualenvs/comfyui/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jeff/.virtualenvs/comfyui/lib/python3.11/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/jeff/.virtualenvs/comfyui/lib/python3.11/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/net/dj/code/clones/github.com/comfyanonymous/ComfyUI/custom_nodes/ComfyUI_VLM_nodes/nodes/moondream/phi/modeling_phi.py", line 462, in forward
    padding_mask.masked_fill_(key_padding_mask, 0.0)
RuntimeError: The expanded size of the tensor (749) must match the existing size (750) at non-singleton dimension 1.  Target sizes: [1, 749].  Tensor sizes: [1, 750]
gokayfem commented 7 months ago

yes im aware of this problem about moondream on other platforms (it works on linux and windows) , i couldnt solve it for mac users. i will look into this in near future.

chaincrafter commented 7 months ago

image Same here, Ubuntu 22.04 comfyui up to date...

gokayfem commented 7 months ago

image Same here, Ubuntu 22.04 comfyui up to date...

i wish i can reproduce same error but in colab(linux) or in my pc windows so i can iterate on this error but it works in both case.

image

i just tried it on colab

chaincrafter commented 7 months ago

i created a complete new conda env with python=3.11. cloned the official comfyui repo. cloned your repo and added the same nodes as you have.

File "/home/dev/ComfyUI/custom_nodes/ComfyUI_VLM_nodes/nodes/moondream/phi/modeling_phi.py", line 462, in forward padding_mask.masked_fill_(key_padding_mask, 0.0) RuntimeError: The expanded size of the tensor (748) must match the existing size (749) at non-singleton dimension 1. Target sizes: [1, 748]. Tensor sizes: [1, 749]

Same problem. Also its a standard ubuntu 22.04 install with a RTX4090.

julien-blanchon commented 7 months ago

That's very strange, this also happen to me with Ubuntu 22.04 and RTX 3090.

A quick fix is to replace:

 if key_padding_mask is not None:
            padding_mask = torch.full(
                (batch_size, seqlen), -10000.0, dtype=scores.dtype, device=scores.device
            )
            padding_mask.masked_fill_(key_padding_mask, 0.0)

            scores = scores + rearrange(padding_mask, "b s -> b 1 1 s")

With

if key_padding_mask is not None:
            padding_mask = torch.full(
                (batch_size, seqlen_k),
                -10000.0,
                dtype=scores.dtype,
                device=scores.device,
            )
            key_padding_mask = key_padding_mask[:, :seqlen_k]
            padding_mask.masked_fill_(key_padding_mask, 0.0)

            scores = scores + rearrange(padding_mask, "b s -> b 1 1 s")

At: https://github.com/gokayfem/ComfyUI_VLM_nodes/blob/a363153eb58d2599150db6310fe21615e7f01aea/nodes/moondream/phi/modeling_phi.py#L386-L392

gokayfem commented 7 months ago

thanks for the help. i added this fix to the repo. i also checked it didnt broke the working ones.