comfyanonymous / ComfyUI

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
https://www.comfy.org/
GNU General Public License v3.0
53.01k stars 5.61k forks source link

[Feature Request:] Allow prompt batching #1561

Open juancopi81 opened 1 year ago

juancopi81 commented 1 year ago

I've been experimenting with batch generation and works fine with the Image Batch and Mask Batch of the WAS_Node_Suite.py. However, there is not a way to handle batches of text inputs. I've modified the encode method in ComfyUI/nodes.py, and it appears to be working as expected.

class CLIPTextEncode:
    @classmethod
    def INPUT_TYPES(s):
        return {"required": {"text": ("STRING", {"multiline": True}), "clip": ("CLIP", )}}
    RETURN_TYPES = ("CONDITIONING",)
    FUNCTION = "encode"

    CATEGORY = "conditioning"

    def encode(self, clip, text):
        if isinstance(text, list):
            batched_cond = []
            batched_pooled = []

            for single_text in text:
                tokens = clip.tokenize(single_text)
                cond, pooled = clip.encode_from_tokens(tokens, return_pooled=True)
                batched_cond.append(cond)
                batched_pooled.append({"pooled_output": pooled})

            # Stack along a new dimension to create a batched tensor
            batched_cond_tensor = torch.stack(batched_cond, dim=0).squeeze(1)

            return ([[batched_cond_tensor, {"pooled_output": batched_pooled}]], )
        else:
            tokens = clip.tokenize(text)
            cond, pooled = clip.encode_from_tokens(tokens, return_pooled=True)
            return ([[cond, {"pooled_output": pooled}]], )

Would you be interested in a Pull Request for this change?

CHollman82 commented 11 months ago

There are plenty of ways to do this already though... I use wildcard files to dynamically create different prompts with each generation. If you want to just iterate through a list of prompts you could still use a wildcard file and just use an iterator from 0 to [num_prompts] for the input. This node that already exists might work even:

image

I'm not sure if iterating starting at 0 would iterate through each line, probably not actually if it's being used as a seed, but it could easily be modified such that the seed input is just a line index input (with a modulus to wrap back). Heck I'd even do it for you, I'm working on a node pack and would be interested in this as well.

MrNeon commented 10 months ago

@CHollman82 It's not the same functionality.

What juancopi81's code allows is generating images in a batch with each having a different prompt. It's a very useful ability and just what I was looking for to make using Stable Video Diffusion better.

bmf-info commented 9 months ago

I've been experimenting with batch generation and works fine with the Image Batch and Mask Batch of the WAS_Node_Suite.py. However, there is not a way to handle batches of text inputs. I've modified the encode method in ComfyUI/nodes.py, and it appears to be working as expected.

class CLIPTextEncode:
    @classmethod
    def INPUT_TYPES(s):
        return {"required": {"text": ("STRING", {"multiline": True}), "clip": ("CLIP", )}}
    RETURN_TYPES = ("CONDITIONING",)
    FUNCTION = "encode"

    CATEGORY = "conditioning"

    def encode(self, clip, text):
        if isinstance(text, list):
            batched_cond = []
            batched_pooled = []

            for single_text in text:
                tokens = clip.tokenize(single_text)
                cond, pooled = clip.encode_from_tokens(tokens, return_pooled=True)
                batched_cond.append(cond)
                batched_pooled.append({"pooled_output": pooled})

            # Stack along a new dimension to create a batched tensor
            batched_cond_tensor = torch.stack(batched_cond, dim=0).squeeze(1)

            return ([[batched_cond_tensor, {"pooled_output": batched_pooled}]], )
        else:
            tokens = clip.tokenize(text)
            cond, pooled = clip.encode_from_tokens(tokens, return_pooled=True)
            return ([[cond, {"pooled_output": pooled}]], )

Would you be interested in a Pull Request for this change?

So with your code I just have to write 6 lines (6 different prompts) to create 6 different images in one generation, right? I've been looking for a way to do this but haven't found anything yet.

juancopi81 commented 9 months ago

Hi @bmf-info I have not used it lately, but yeah, the idea was to generate different images per batch. With the code, each image could have its own prompt. Please let me know if it works for you :)

WeeBull commented 8 months ago

For those that find this ticket and are looking for a way...

The Inspire pack from @ltdrdata has a "Read Prompts from File". This combined with the "Unzip Prompts" node will give you lists of +ve and -ve prompts you can CLIP encode and KSample. Example flow attached (needs Impact and Inspire node packs)

prompt_batch.json

tusharbhutt commented 5 months ago

I'd love to see this added as native functionality! I haven't been able to figure out the Inspire JSON file to pull from a random Dynamics Prompt file, so if the above is implemented, I'd appreciate it.

SayanoAI commented 3 months ago

I've been experimenting with batch generation and works fine with the Image Batch and Mask Batch of the WAS_Node_Suite.py. However, there is not a way to handle batches of text inputs. I've modified the encode method in ComfyUI/nodes.py, and it appears to be working as expected.

class CLIPTextEncode:
    @classmethod
    def INPUT_TYPES(s):
        return {"required": {"text": ("STRING", {"multiline": True}), "clip": ("CLIP", )}}
    RETURN_TYPES = ("CONDITIONING",)
    FUNCTION = "encode"

    CATEGORY = "conditioning"

    def encode(self, clip, text):
        if isinstance(text, list):
            batched_cond = []
            batched_pooled = []

            for single_text in text:
                tokens = clip.tokenize(single_text)
                cond, pooled = clip.encode_from_tokens(tokens, return_pooled=True)
                batched_cond.append(cond)
                batched_pooled.append({"pooled_output": pooled})

            # Stack along a new dimension to create a batched tensor
            batched_cond_tensor = torch.stack(batched_cond, dim=0).squeeze(1)

            return ([[batched_cond_tensor, {"pooled_output": batched_pooled}]], )
        else:
            tokens = clip.tokenize(text)
            cond, pooled = clip.encode_from_tokens(tokens, return_pooled=True)
            return ([[cond, {"pooled_output": pooled}]], )

Would you be interested in a Pull Request for this change?

The final structure is: [[batched_cond_tensor, {"pooled_output": [{"pooled_output": pooled},{"pooled_output": pooled},...]}]] because batched_pooled is a list of {"pooled_output": pooled}

Is the batched pooled output supposed to be nested? Will this allow us to apply a different prompt to each latent in the batch?

envy-ai commented 2 weeks ago

For those that find this ticket and are looking for a way...

The Inspire pack from @ltdrdata has a "Read Prompts from File". This combined with the "Unzip Prompts" node will give you lists of +ve and -ve prompts you can CLIP encode and KSample. Example flow attached (needs Impact and Inspire node packs)

prompt_batch.json

This processes prompts sequentially, not simultaneously. It doesn't do what people are looking for here, which is to allow multiple different prompts during a single batched generation (like if you set batch_size > 1 in the Empty Latent Image node).

If you use this with, say, a batch size of 4, then it does every prompt in the zipped prompt in a batch of 4.