Chaoses-Ib / ComfyScript

A Python frontend and library for ComfyUI
https://discord.gg/arqJbtEg7w
MIT License
422 stars 24 forks source link

Getting vastly different output ComfyScript vs ComfyUI - Pulid Flux custom node #75

Open lschaupp opened 1 month ago

lschaupp commented 1 month ago

Here is a minimal example:

from comfy_script.runtime.real import *
load()
from comfy_script.runtime.real.nodes import *
import comfy.model_management
from comfy.model_management import xformers_enabled, vae_dtype, get_free_memory
import uuid
import json
import io
import numpy as np
import pdb
import os
import gc
import sys
from torch._C import BoolType
import torchvision.transforms as T
from diffusers.utils import load_image
import cv2
import traceback
from math import atan2, pi, ceil
import pdb
from PIL import Image, ImageFilter
from typing import Optional
import os
import torch

transform = T.ToPILImage()

def run_pulid():
    with torch.inference_mode():
        noise = RandomNoise(52535031757376)
        model = UNETLoader('flux1-dev-fp8.safetensors', 'default')
        pulidflux = PulidFluxModelLoader('pulid_flux_v0.9.0.safetensors')
        eva_clip = PulidFluxEvaClipLoader()
        faceanalysis = PulidFluxInsightFaceLoader('CUDA')
        image, _ = LoadImage('test.jpg')
        model2 = ApplyPulidFlux(model, pulidflux, eva_clip, faceanalysis, image, 1, 0, 1, None)
        clip = DualCLIPLoader('t5xxl_fp16.safetensors', 'clip_l.safetensors', 'flux')
        clip_text_encode_positive_prompt_conditioning = CLIPTextEncode("a woman as a pirate sitting on a throne. Behind her is a ship.", clip)
        clip_text_encode_positive_prompt_conditioning = FluxGuidance(clip_text_encode_positive_prompt_conditioning, 3.5)
        guider = BasicGuider(model2, clip_text_encode_positive_prompt_conditioning)
        sampler = KSamplerSelect('euler')
        denoise = 0.95
        sigmas = BasicScheduler(model, 'beta', 20, denoise)
        image3, _ = LoadImage('scene03.png')
        vae = VAELoader('ae.safetensors')
        latent = VAEEncode(image3, vae)
        latent, _ = SamplerCustomAdvanced(noise, guider, sampler, sigmas, latent)
        image4 = VAEDecode(latent, vae)
        for idx, img in enumerate(image4):
            img = transform(img.permute(2, 0, 1))
            path = "./pulid_{idx}.jpg".format(idx=idx)
            img.save(path)
            print("Saved image as:",path)

run_pulid()

When I run the nodes in ComfyUI (visually), the pipeline works and I am receiving the correct image. However running it via ComfyScript, I am getting a complete different picture. Something is very off.

Chaoses-Ib commented 1 month ago

Does virtual mode have the correct result?

By the way, there is a built-in node ImageToPIL to convert VAE decoded images to PIL images in real mode.

lschaupp commented 1 month ago

I can confirm, virtual mode does not have the issue. Any hints on how to fix this for real mode?

I pulled the latest changes of this repo - to be up to date. Still the same issue.

i am getting this warning in real mode when loading pretrained EVA02-CLIP-L-14-336 weights (eva_clip).

incompatible_keys.missing_keys: ['visual.rope.freqs_cos', 'visual.rope.freqs_sin', 'visual.blocks.0.attn.rope.freqs_cos', 'visual.blocks.0.attn.rope.freqs_sin', 'visual.blocks.1.attn.rope.freqs_cos', 'visual.blocks.1.attn.rope.freqs_sin', 'visual.blocks.2.attn.rope.freqs_cos', 'visual.blocks.2.attn.rope.freqs_sin', 'visual.blocks.3.attn.rope.freqs_cos', 'visual.blocks.3.attn.rope.freqs_sin', 'visual.blocks.4.attn.rope.freqs_cos', 'visual.blocks.4.attn.rope.freqs_sin', 'visual.blocks.5.attn.rope.freqs_cos', 'visual.blocks.5.attn.rope.freqs_sin', 'visual.blocks.6.attn.rope.freqs_cos', 'visual.blocks.6.attn.rope.freqs_sin', 'visual.blocks.7.attn.rope.freqs_cos', 'visual.blocks.7.attn.rope.freqs_sin', 'visual.blocks.8.attn.rope.freqs_cos', 'visual.blocks.8.attn.rope.freqs_sin', 'visual.blocks.9.attn.rope.freqs_cos', 'visual.blocks.9.attn.rope.freqs_sin', 'visual.blocks.10.attn.rope.freqs_cos', 'visual.blocks.10.attn.rope.freqs_sin', 'visual.blocks.11.attn.rope.freqs_cos', 'visual.blocks.11.attn.rope.freqs_sin', 'visual.blocks.12.attn.rope.freqs_cos', 'visual.blocks.12.attn.rope.freqs_sin', 'visual.blocks.13.attn.rope.freqs_cos', 'visual.blocks.13.attn.rope.freqs_sin', 'visual.blocks.14.attn.rope.freqs_cos', 'visual.blocks.14.attn.rope.freqs_sin', 'visual.blocks.15.attn.rope.freqs_cos', 'visual.blocks.15.attn.rope.freqs_sin', 'visual.blocks.16.attn.rope.freqs_cos', 'visual.blocks.16.attn.rope.freqs_sin', 'visual.blocks.17.attn.rope.freqs_cos', 'visual.blocks.17.attn.rope.freqs_sin', 'visual.blocks.18.attn.rope.freqs_cos', 'visual.blocks.18.attn.rope.freqs_sin', 'visual.blocks.19.attn.rope.freqs_cos', 'visual.blocks.19.attn.rope.freqs_sin', 'visual.blocks.20.attn.rope.freqs_cos', 'visual.blocks.20.attn.rope.freqs_sin', 'visual.blocks.21.attn.rope.freqs_cos', 'visual.blocks.21.attn.rope.freqs_sin', 'visual.blocks.22.attn.rope.freqs_cos', 'visual.blocks.22.attn.rope.freqs_sin', 'visual.blocks.23.attn.rope.freqs_cos', 'visual.blocks.23.attn.rope.freqs_sin']

In virtual mode, I am not observing this warning.

Chaoses-Ib commented 1 month ago

What seems most suspicious to me is the image transforming, does ImageToPIL have the same result with the following code?

for idx, img in enumerate(ImageToPIL(image4)):
    path = "./pulid_{idx}.jpg".format(idx=idx)
    img.save(path)
    print("Saved image as:",path)
lschaupp commented 1 month ago

What seems most suspicious to me is the image transforming, does ImageToPIL have the same result with the following code?

for idx, img in enumerate(ImageToPIL(image4)):
    path = "./pulid_{idx}.jpg".format(idx=idx)
    img.save(path)
    print("Saved image as:",path)

getting the same image - issue still persists

I also used "SaveImage()" (Comfy) to compare - and I am getting the same image as with ImageToPIL. So the issue is somewhere else....

Chaoses-Ib commented 1 month ago

incompatible_keys.missing_keys: ['visual.rope.freqs_cos', 'visual.rope.freqs_sin', 'visual.blocks.0.attn.rope.freqs_cos', 'visual.blocks.0.attn.rope.freqs_sin', 'visual.blocks.1.attn.rope.freqs_cos', 'visual.blocks.1.attn.rope.freqs_sin', 'visual.blocks.2.attn.rope.freqs_cos', 'visual.blocks.2.attn.rope.freqs_sin', 'visual.blocks.3.attn.rope.freqs_cos', 'visual.blocks.3.attn.rope.freqs_sin', 'visual.blocks.4.attn.rope.freqs_cos', 'visual.blocks.4.attn.rope.freqs_sin', 'visual.blocks.5.attn.rope.freqs_cos', 'visual.blocks.5.attn.rope.freqs_sin', 'visual.blocks.6.attn.rope.freqs_cos', 'visual.blocks.6.attn.rope.freqs_sin', 'visual.blocks.7.attn.rope.freqs_cos', 'visual.blocks.7.attn.rope.freqs_sin', 'visual.blocks.8.attn.rope.freqs_cos', 'visual.blocks.8.attn.rope.freqs_sin', 'visual.blocks.9.attn.rope.freqs_cos', 'visual.blocks.9.attn.rope.freqs_sin', 'visual.blocks.10.attn.rope.freqs_cos', 'visual.blocks.10.attn.rope.freqs_sin', 'visual.blocks.11.attn.rope.freqs_cos', 'visual.blocks.11.attn.rope.freqs_sin', 'visual.blocks.12.attn.rope.freqs_cos', 'visual.blocks.12.attn.rope.freqs_sin', 'visual.blocks.13.attn.rope.freqs_cos', 'visual.blocks.13.attn.rope.freqs_sin', 'visual.blocks.14.attn.rope.freqs_cos', 'visual.blocks.14.attn.rope.freqs_sin', 'visual.blocks.15.attn.rope.freqs_cos', 'visual.blocks.15.attn.rope.freqs_sin', 'visual.blocks.16.attn.rope.freqs_cos', 'visual.blocks.16.attn.rope.freqs_sin', 'visual.blocks.17.attn.rope.freqs_cos', 'visual.blocks.17.attn.rope.freqs_sin', 'visual.blocks.18.attn.rope.freqs_cos', 'visual.blocks.18.attn.rope.freqs_sin', 'visual.blocks.19.attn.rope.freqs_cos', 'visual.blocks.19.attn.rope.freqs_sin', 'visual.blocks.20.attn.rope.freqs_cos', 'visual.blocks.20.attn.rope.freqs_sin', 'visual.blocks.21.attn.rope.freqs_cos', 'visual.blocks.21.attn.rope.freqs_sin', 'visual.blocks.22.attn.rope.freqs_cos', 'visual.blocks.22.attn.rope.freqs_sin', 'visual.blocks.23.attn.rope.freqs_cos', 'visual.blocks.23.attn.rope.freqs_sin']

The author of PuLID ComfyUI said this is fine: https://github.com/cubiq/PuLID_ComfyUI/issues/7#issuecomment-2127488360 . There are many other users also have this warning and it seems to work without problem. So it's probably not the cause.

image

lschaupp commented 1 month ago

I can confirm, virtual mode does not have the issue. Any hints on how to fix this for real mode?

I pulled the latest changes of this repo - to be up to date. Still the same issue.

i am getting this warning in real mode when loading pretrained EVA02-CLIP-L-14-336 weights (eva_clip).

incompatible_keys.missing_keys: ['visual.rope.freqs_cos', 'visual.rope.freqs_sin', 'visual.blocks.0.attn.rope.freqs_cos', 'visual.blocks.0.attn.rope.freqs_sin', 'visual.blocks.1.attn.rope.freqs_cos', 'visual.blocks.1.attn.rope.freqs_sin', 'visual.blocks.2.attn.rope.freqs_cos', 'visual.blocks.2.attn.rope.freqs_sin', 'visual.blocks.3.attn.rope.freqs_cos', 'visual.blocks.3.attn.rope.freqs_sin', 'visual.blocks.4.attn.rope.freqs_cos', 'visual.blocks.4.attn.rope.freqs_sin', 'visual.blocks.5.attn.rope.freqs_cos', 'visual.blocks.5.attn.rope.freqs_sin', 'visual.blocks.6.attn.rope.freqs_cos', 'visual.blocks.6.attn.rope.freqs_sin', 'visual.blocks.7.attn.rope.freqs_cos', 'visual.blocks.7.attn.rope.freqs_sin', 'visual.blocks.8.attn.rope.freqs_cos', 'visual.blocks.8.attn.rope.freqs_sin', 'visual.blocks.9.attn.rope.freqs_cos', 'visual.blocks.9.attn.rope.freqs_sin', 'visual.blocks.10.attn.rope.freqs_cos', 'visual.blocks.10.attn.rope.freqs_sin', 'visual.blocks.11.attn.rope.freqs_cos', 'visual.blocks.11.attn.rope.freqs_sin', 'visual.blocks.12.attn.rope.freqs_cos', 'visual.blocks.12.attn.rope.freqs_sin', 'visual.blocks.13.attn.rope.freqs_cos', 'visual.blocks.13.attn.rope.freqs_sin', 'visual.blocks.14.attn.rope.freqs_cos', 'visual.blocks.14.attn.rope.freqs_sin', 'visual.blocks.15.attn.rope.freqs_cos', 'visual.blocks.15.attn.rope.freqs_sin', 'visual.blocks.16.attn.rope.freqs_cos', 'visual.blocks.16.attn.rope.freqs_sin', 'visual.blocks.17.attn.rope.freqs_cos', 'visual.blocks.17.attn.rope.freqs_sin', 'visual.blocks.18.attn.rope.freqs_cos', 'visual.blocks.18.attn.rope.freqs_sin', 'visual.blocks.19.attn.rope.freqs_cos', 'visual.blocks.19.attn.rope.freqs_sin', 'visual.blocks.20.attn.rope.freqs_cos', 'visual.blocks.20.attn.rope.freqs_sin', 'visual.blocks.21.attn.rope.freqs_cos', 'visual.blocks.21.attn.rope.freqs_sin', 'visual.blocks.22.attn.rope.freqs_cos', 'visual.blocks.22.attn.rope.freqs_sin', 'visual.blocks.23.attn.rope.freqs_cos', 'visual.blocks.23.attn.rope.freqs_sin']

In virtual mode, I am not observing this warning.

Gotcha. Agreed.

lschaupp commented 1 month ago

@Chaoses-Ib any tips on to debug this? I understand that I can observe the output values of each layer in real mode. However, how can I get the output vales in virtual mode so I can compare the arrays of each output and see if there is any difference?

Chaoses-Ib commented 1 month ago

There is no built-in support except for Image (#29). But you can hook the nodes yourself with standalone (in-process) runtime. i.e. close ComfyUI server, and then:

from comfy_script.runtime import *
# load(args=ComfyUIArgs('--disable-all-custom-nodes'))
load()
from comfy_script.runtime.nodes import *

# Hook all nodes
import comfy_script.runtime.nodes
for node in comfy_script.runtime.nodes.__all__:
    node = getattr(comfy_script.runtime.nodes, node)
    info = getattr(node, 'info', None)
    if info is None:
        continue
    cls = info['_cls']

    f = getattr(cls, '_orig_f', None)
    if f is None:
        f = getattr(cls, cls.FUNCTION)
        setattr(cls, '_orig_f', f)
    def hook(*args, _cls=cls, _orig_f=f, **kwargs):
        print(f"Running {_cls.__name__}")
        outputs = _orig_f(*args, **kwargs)
        # Do whatever you want with the outputs, note it's always a tuple/dict, not auto-unpacked like in real mode
        print(outputs)
        return outputs
    setattr(cls, cls.FUNCTION, hook)

queue.watch_display(False)
with Workflow():
    model, clip, vae = CheckpointLoaderSimple('v1-5-pruned-emaonly.ckpt')
    conditioning = CLIPTextEncode('beautiful scenery nature glass bot2tle landscape, , purple galaxy bottle,', clip)
    conditioning2 = CLIPTextEncode('text, watermark', clip)
    latent = EmptyLatentImage(512, 511, 1)
    latent = KSampler(model, 4, 20, 8, 'euler', 'normal', conditioning, conditioning2, latent, 1)
    image = VAEDecode(latent, vae)
    SaveImage(image, 'ComfyUI')

Also note only nodes with changed inputs will be re-executed each time.