comfyanonymous / ComfyUI

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
https://www.comfy.org/
GNU General Public License v3.0
56.23k stars 5.96k forks source link

[Feature Request] Please add "reference_only" ControlNet feature #661

Open MoonMoon82 opened 1 year ago

MoonMoon82 commented 1 year ago

There is a new ControlNet feature called "reference_only" which seems to be a preprocessor without any controlnet model. Please add this feature to the controlnet nodes.

Kind regards

https://www.youtube.com/watch?v=tBwmbTwMxfQ

MoonMoon82 commented 1 year ago

@BlenderNeko Maybe you have an idea how this "reference_only" preprocessor could work in comfyui ?

BlenderNeko commented 1 year ago

Reference only is way more involved as it is technically not a controlnet, and would require changes to the unet code. There has been some talk and thought about implementing it in comfy, but so far the consensus was to at least wait a bit for the reference_only implementation in the cnet repo to stabilize, or have some source that clearly explains why and what they are doing.

It's likely that we'd see an implementation of this before any kind of reference only support, simply because of ease of implementation. Perhaps that could in part fill a similar role.

GamingDaveUk commented 1 year ago

+1 for the request to have controlnet reference.

I was trying to make emotion cards for silly tavern, in comfy ui that should have been a doddle setting up a workflow so it would generate an image then use that image to create 26 more each with a different emotion but the same person. with out reference only though thats just not do-able.

catboxanon commented 1 year ago

Guess it's not going to be implemented. https://desuarchive.org/g/thread/94223958/#94225957

comfyanonymous commented 1 year ago

Here's a simple node for it, if it works fine I'll put it somewhere more visible, download and save that reference_only.py to your custom_nodes folder: https://gist.github.com/comfyanonymous/343e5675f9a2c8281fde0c440df2e2c6

Copy and ctrl-v this to the UI for the workflow:

{
  "last_node_id": 15,
  "last_link_id": 37,
  "nodes": [
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1209,
        188
      ],
      "size": {
        "0": 210,
        "1": 46
      },
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 7
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 8
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            9
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAEDecode"
      }
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        233,
        117
      ],
      "size": {
        "0": 422.84503173828125,
        "1": 164.31304931640625
      },
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 3
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            4
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "crude drawing of girl"
      ]
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        237,
        370
      ],
      "size": {
        "0": 425.27801513671875,
        "1": 180.6060791015625
      },
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 5
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            6
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "text, watermark"
      ]
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        863,
        186
      ],
      "size": {
        "0": 315,
        "1": 262
      },
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 37
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 4
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 6
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 34
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            7
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        719286772344905,
        "fixed",
        20,
        8,
        "euler",
        "normal",
        1
      ]
    },
    {
      "id": 9,
      "type": "SaveImage",
      "pos": [
        1548,
        180
      ],
      "size": [
        1454.6668601568254,
        548.2885143635223
      ],
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 9
        }
      ],
      "properties": {},
      "widgets_values": [
        "refer/ComfyUI"
      ]
    },
    {
      "id": 4,
      "type": "CheckpointLoaderSimple",
      "pos": [
        -563,
        510
      ],
      "size": {
        "0": 315,
        "1": 98
      },
      "flags": {},
      "order": 0,
      "mode": 0,
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            32
          ],
          "slot_index": 0
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "links": [
            3,
            5
          ],
          "slot_index": 1
        },
        {
          "name": "VAE",
          "type": "VAE",
          "links": [
            8,
            20
          ],
          "slot_index": 2
        }
      ],
      "properties": {
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "sd_xl_1.0.safetensors"
      ]
    },
    {
      "id": 14,
      "type": "ImageScale",
      "pos": [
        -129,
        763
      ],
      "size": {
        "0": 315,
        "1": 130
      },
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 19
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            18
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "ImageScale"
      },
      "widgets_values": [
        "nearest-exact",
        768,
        768,
        "center"
      ]
    },
    {
      "id": 13,
      "type": "LoadImage",
      "pos": [
        -483,
        777
      ],
      "size": {
        "0": 315,
        "1": 314
      },
      "flags": {},
      "order": 1,
      "mode": 0,
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            19
          ],
          "shape": 3,
          "slot_index": 0
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null,
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "example.png",
        "image"
      ]
    },
    {
      "id": 15,
      "type": "ReferenceOnlySimple",
      "pos": [
        515,
        675
      ],
      "size": {
        "0": 315,
        "1": 78
      },
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 32,
          "slot_index": 0
        },
        {
          "name": "reference",
          "type": "LATENT",
          "link": 35
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            37
          ],
          "shape": 3,
          "slot_index": 0
        },
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            34
          ],
          "shape": 3,
          "slot_index": 1
        }
      ],
      "properties": {
        "Node name for S&R": "ReferenceOnlySimple"
      },
      "widgets_values": [
        2
      ]
    },
    {
      "id": 12,
      "type": "VAEEncode",
      "pos": [
        248,
        732
      ],
      "size": {
        "0": 210,
        "1": 46
      },
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "pixels",
          "type": "IMAGE",
          "link": 18,
          "slot_index": 0
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 20,
          "slot_index": 1
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            35
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAEEncode"
      }
    }
  ],
  "links": [
    [
      3,
      4,
      1,
      6,
      0,
      "CLIP"
    ],
    [
      4,
      6,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      5,
      4,
      1,
      7,
      0,
      "CLIP"
    ],
    [
      6,
      7,
      0,
      3,
      2,
      "CONDITIONING"
    ],
    [
      7,
      3,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      8,
      4,
      2,
      8,
      1,
      "VAE"
    ],
    [
      9,
      8,
      0,
      9,
      0,
      "IMAGE"
    ],
    [
      18,
      14,
      0,
      12,
      0,
      "IMAGE"
    ],
    [
      19,
      13,
      0,
      14,
      0,
      "IMAGE"
    ],
    [
      20,
      4,
      2,
      12,
      1,
      "VAE"
    ],
    [
      32,
      4,
      0,
      15,
      0,
      "MODEL"
    ],
    [
      34,
      15,
      1,
      3,
      3,
      "LATENT"
    ],
    [
      35,
      12,
      0,
      15,
      1,
      "LATENT"
    ],
    [
      37,
      15,
      0,
      3,
      0,
      "MODEL"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {},
  "version": 0.4
}
comfyanonymous commented 1 year ago

I put it in this repo: https://github.com/comfyanonymous/ComfyUI_experiments

patriciagomesoo commented 1 year ago

Hello, I'm trying to install reference_only but I get this error:

"title":"ComfyUI_experiments/reference_only.py at master · comfyanonymous/ComfyUI_experiments","locale":"en"} NameError: name 'true' is not defined

Cannot import C:\Users\GABSU\comfyui\ComfyUI\custom_nodes\reference_only.py module for custom nodes: name 'true' is not defined

Have any idea why this is happening? I've added the reference_only.py to custom_nodes folder. Thank you so much for your work :)

julien-blanchon commented 1 year ago

Hey @comfyanonymous the installation workflow work well for me but the result are pretty bad. Could you share an workflow example that work well for you ? I'm so far from the result show in the video

addddd2 commented 1 year ago

Wow! It works perfect for me. It is possible use two reference input? To process video frames, for example..

MoonMoon82 commented 1 year ago

@addddd2 I opened a feature request some weeks ago to get something like img2img with reference_only node: https://github.com/comfyanonymous/ComfyUI_experiments/issues/5

I already tried it on my own, but I guess this kind of img2img does not work that way.. The results looked more like the input image than the reference image. Maybe @comfyanonymous could say something more about it...

addddd2 commented 1 year ago

Sorry for this code. Did the best I could. I hope the author does the right thing.

It receives as input two reference and one intended for img2img

--- reference_only.py   2023-07-26 22:24:24.000000000 +0300
+++ reference_only3.py  2023-08-25 00:51:27.233217800 +0300
@@ -1,10 +1,12 @@
 import torch

-class ReferenceOnlySimple:
+class ReferenceOnlySimple3:
     @classmethod
     def INPUT_TYPES(s):
         return {"required": { "model": ("MODEL",),
                               "reference": ("LATENT",),
+                              "reference2": ("LATENT",),
+                              "input": ("LATENT",),
                               "batch_size": ("INT", {"default": 1, "min": 1, "max": 64})
                               }}

@@ -13,28 +15,31 @@

     CATEGORY = "custom_node_experiments"

-    def reference_only(self, model, reference, batch_size):
+    def reference_only(self, model, reference, reference2, input, batch_size):
         model_reference = model.clone()
         size_latent = list(reference["samples"].shape)
         size_latent[0] = batch_size
-        latent = {}
-        latent["samples"] = torch.zeros(size_latent)
+        latent = input

-        batch = latent["samples"].shape[0] + reference["samples"].shape[0]
+        batch = latent["samples"].shape[0] + reference["samples"].shape[0] + reference2["samples"].shape[0]
+  
+        
         def reference_apply(q, k, v, extra_options):
             k = k.clone().repeat(1, 2, 1)
             offset = 0
             if q.shape[0] > batch:
                 offset = batch
+                
+            re = extra_options["transformer_index"] % 2

             for o in range(0, q.shape[0], batch):
                 for x in range(1, batch):
-                    k[x + o, q.shape[1]:] = q[o,:]
+                    k[x + o, q.shape[1]:] = q[o + re,:]

             return q, k, k

         model_reference.set_model_attn1_patch(reference_apply)
-        out_latent = torch.cat((reference["samples"], latent["samples"]))
+        out_latent = torch.cat((reference["samples"], reference2["samples"], latent["samples"]))
         if "noise_mask" in latent:
             mask = latent["noise_mask"]
         else:
@@ -47,8 +52,8 @@
             mask = mask.repeat(latent["samples"].shape[0], 1, 1)

         out_mask = torch.zeros((1,mask.shape[1],mask.shape[2]), dtype=torch.float32, device="cpu")
-        return (model_reference, {"samples": out_latent, "noise_mask": torch.cat((out_mask, mask))})
+        return (model_reference, {"samples": out_latent, "noise_mask": torch.cat((out_mask,out_mask, mask))})

 NODE_CLASS_MAPPINGS = {
-    "ReferenceOnlySimple": ReferenceOnlySimple,
+    "ReferenceOnlySimple3": ReferenceOnlySimple3,
 }
ntdviet commented 1 year ago

Can you show your workflow? Somehow it is not working very well for me.

On Thu, Aug 24, 2023, 02:20 addddd2 @.***> wrote:

Wow! It works perfect for me. It is possible use two reference input? To process video frames, for example..

— Reply to this email directly, view it on GitHub https://github.com/comfyanonymous/ComfyUI/issues/661#issuecomment-1690806371, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM7X3GXRFE7POIGFQM7UIETXW2M3NANCNFSM6AAAAAAYDJOJUU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

addddd2 commented 1 year ago

@ntdviet workflows is here https://github.com/comfyanonymous/ComfyUI_experiments/issues/5

Jannchie commented 7 months ago

I made a custom node that supports reference only and reference only + adain, and it can also adjust the style strength.

This is a diffusers-based custom node, which is used differently than Comfy's KSampler-based one.

https://github.com/Jannchie/ComfyUI-J https://civitai.com/models/361265/comfyui-j-diffusers-based-pipeline-nodes

michP247 commented 2 months ago

I made a custom node that supports reference only and reference only + adain, and it can also adjust the style strength.

This is a diffusers-based custom node, which is used differently than Comfy's KSampler-based one.

https://github.com/Jannchie/ComfyUI-J https://civitai.com/models/361265/comfyui-j-diffusers-based-pipeline-nodes

looks epic, can't wait for lora support

wujohns commented 2 months ago

I made a custom node that supports reference only and reference only + adain, and it can also adjust the style strength.我做了一个自定义节点,支持仅引用和仅引用+adain,还可以调整样式强度。

This is a diffusers-based custom node, which is used differently than Comfy's KSampler-based one.这是一个基于扩散器的自定义节点,其使用方式与 Comfy 的基于 KSampler 的节点不同。

https://github.com/Jannchie/ComfyUI-J https://civitai.com/models/361265/comfyui-j-diffusers-based-pipeline-nodes

This is the best reference_only for comfyui, wait for sdxl support