Ipadapter problem after commit c28... (comfyui won't start generating)

patientx commented 10 months ago

PC : windows 10, 16 gb ddr4-3000, rx 6600, using directml with no additional command parameters.

Updated today with manager , and tried my usual workflow which has ipadapter included for faces, when it comes to actually generating the image , it just stops there, nothing happens. I have to close the command prompt and relaunch the app.

Loading 1 new model 0%| | 0/8 [00:00<?, ?it/s]

I was mainly using lcm so tried normal sd1.5 models , the same results. Tried different samplers , same. If I get ipadapter out of the workflow everything works correctly. So I am not updating daily , thought something in the middle changed, tried every commit and found out that it works up until "timestepping, fix" , after that the problem is here. So now I am on c28a04466b17d760a345aea41d6a593c0a312c95. (last one before that fix) . Everything works as it was before.

cubiq commented 10 months ago

@laksjdjf it's your code I believe, can you take a look at it?

We can revert that change without issues.

@patientx can you post your startup message from comfyui?

cubiq commented 10 months ago

sorry I misunderstood... so it's 1ed4f93 that is not working.

did you update comfyui?

patientx commented 10 months ago

I always use latest version of comfyui, always update at start with git pull. Btw at first I tried using previous commits of comfyui and it was around 30 commits before that the extension at latest version worked, so I thought comfy is the main app and the latest additions are more important if I can fix the problem with the node. So here we are :)

Here is comfyui startup , https://pastebin.com/a6AUGPNu

For the record I tried deleting all the other custom nodes except this one, still had the same problem.

patientx commented 10 months ago

comfyui updated and they say they resolved the ipadapter issue, so I updated both comfy and this node to the latest, nope still the same issue.

cubiq commented 10 months ago

the latest version overrides timestepping if it's not supported, so I really don't know how to help. If it doesn't give any error I would tend to think of some drivers issue maybe? Or maybe you need to upgrade also the python libraries. Can you try with the latest nightly (instead of the stable one?)

MGTRIDER commented 10 months ago

Hi there i can vouch that this is not a driver problem as today i was still using your nodes fine, until i updated the nodes including comfyui to the latest version. What basically happens is that cmd windows shows that inference is starting but then it literally keeps sitting there on the first step until after a while comfyui just crashes.

MGTRIDER commented 10 months ago

If it helps, i am on this release: ComfyUI Revision: 1758 [6b769bca] | Released on '2023-11-30'. And i am also using an AMD gpu. As i said before everything was still working fine today until i updated to the latest commit.

cubiq commented 10 months ago

I feel this is more of a comfyui issue than IPAdapter, but I'll check with comfyanonimous

do you get any error at all?

MGTRIDER commented 10 months ago

I feel this is more of a comfyui issue than IPAdapter, but I'll check with comfyanonimous

do you get any error at all?

No, no error. Everything loads fine until, the part where it starts to generate the image, then it it just hangs on step 0 of 40. But as soon as i bypass the ip adapter node, everything works fine again. And it also doesn't matter if i use time stepping or not, neither if batch unfold is on or off. It does the same thing.

cubiq commented 10 months ago

if you do not use timestepping or batch unfold the code in unchanged. It's just an IF statement so if you don't use either everything is the same as before.

It is important that you refresh the browser cache (or use an incognito window), delete the IPAdapter nodes and recreate them.

If the execution halts and then crashes without errors it is very likely a driver issue.

I talked to Comfyanonymous this morning that confirmed my suspects:

Unfortunately DirectML is very unstable on Windows and it would require a lot of debugging to reach the source of the problem and it could be even related to a specific GPU.

Anyway I made a small change just now that could help (don't hold your breath though). You can give it a try.

MGTRIDER commented 10 months ago

I deleted the node manually, used git clone to copy the repository with the change you made. Sadly no dice, still the same thing. One question, is it normal that in cpu mode it also doesn't work ? Seems like AMD users are getting the short end of the stick again. But even so, thank you for this great node you made. It was fun to work with. Keep doing the good work. Thumbs up to you sir.

cubiq commented 10 months ago

in CPU mode could be just some kind of overflow, I'll see if I can replicate

MGTRIDER commented 10 months ago

In cpu mode when i use the standard quad-cross-attention, it hangs the the same as in directml mode. When i try to use split-cross-attention, it will complain about an autocast 'cuda' or 'cpu' error unless i force unet-bf16. But then it hangs the same again lol. Just for the info, my cpu is a Ryzen 5600G.

patientx commented 10 months ago

just use the old commit , everything is working there

MGTRIDER commented 10 months ago

just use the old commit , everything is working there

Hi there, do you mean the commit of ipadapter plus before the timestepping feature ? If so, how do i go back to the version before the commit ?

MGTRIDER commented 10 months ago

Never mind, i found out how to do it. Thanks for the suggestion. It's still weird though how this commit works fine but since the timestepping commit inference just refuses to work. More so when i use a controlnet node tthat has timestepping feature that works pretty much fine.

cubiq commented 10 months ago

does 1ed4f93 work?

cubiq commented 10 months ago

if you are on c28a044 can you place this code at line 388 and tell me what it says in the command prompt? (delete them afterwards)

        print(model.model.model_sampling.percent_to_sigma(0.0))
        print(model.model.model_sampling.percent_to_sigma(1.0))

Thanks (sorry I don't know how else to test this)

MGTRIDER commented 10 months ago

if you are on c28a044 can you place this code at line 388 and tell me what it says in the command prompt? (delete them afterwards)
        print(model.model.model_sampling.percent_to_sigma(0.0))
        print(model.model.model_sampling.percent_to_sigma(1.0))
Thanks (sorry I don't know how else to test this)

Hi there, in which file should i paste the codes ?

cubiq commented 10 months ago

IPAdapterPlus.py

please note you need to keep the indentation! of the original file

MGTRIDER commented 10 months ago

So these are the two codes i see at line 388:

def apply_ipadapter(self, ipadapter, model, weight, clip_vision=None, image=None, weight_type="original", noise=None, embeds=None, attn_mask=None, start_at=0.0, end_at=1.0):

Should i overwrite them ?

cubiq commented 10 months ago

I don't know what version you are on... just put them above the line

self.dtype = model.model.diffusion_model.dtype

at the same indentation level, do no remove anything.

MGTRIDER commented 10 months ago

I don't know what version you are on... just put them above the line

self.dtype = model.model.diffusion_model.dtype

at the same indentation level, do no remove anything.

I am testing commit c28a044 that doesn't work.

cubiq commented 10 months ago

if you can add those lines it would help me understand, otherwise we can try on discord

PS: you need to stop comfy and restart

MGTRIDER commented 10 months ago

if you are to add those lines it would help me understand, otherwise we can try on discord

Oh sorry, c28a044 does work yes but 1ed4f93 does not. I will go back to the working commit again and paste the codes. Sorry for the confusion.

cubiq commented 10 months ago

you can put them in any commit you are on, it doesn't matter. I just need to know what values are returned. You should see two long float numbers in the command window

MGTRIDER commented 10 months ago

Screenshot 2023-12-02 054939

MGTRIDER commented 10 months ago

Should i also do one for the commit that doesn't work ?

cubiq commented 10 months ago

no, that's fine. thanks. the numbers are correct, so I'm really out of ideas here.

MGTRIDER commented 10 months ago

no, that's fine. thanks. the numbers are correct, so I'm really out of ideas here.

Haha, i understand. Even so thanks again. At least i can still use your node. Even if i have to stay on an older version for now. It gets the job done. So again, thumbs up to you and keep doing good work.

cubiq commented 10 months ago

unless anyone is willing to give me access to their PC remotely...

MGTRIDER commented 10 months ago

I'd like to but i can't really risk it as i also use my PC for work. Not just for hobbies. So i hope you understand.

cubiq commented 10 months ago

no worries, I wouldn't trust a rando on my own PC either

sleppyrobot commented 10 months ago

if you are to add those lines it would help me understand, otherwise we can try on discord

Oh sorry, c28a044 does work yes but 1ed4f93 does not. I will go back to the working commit again and paste the codes. Sorry for the confusion.

Hey there, thanks for posting which commit works, had the same issue also ok AMD GPU on windows.

OtakuD commented 10 months ago

In case its helpful or related I get this error when using on AMD too:

ComfyUI\comfy\model_sampling.py:74: UserWarning: The operator 'aten::frac.out' is not currently supported on the DML backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at D:\a_work\1\s\pytorch-directml-plugin\torch_directml\csrc\dml\dml_cpu_fallback.cpp:17.) w = t.frac()

cubiq commented 10 months ago

thanks for the additional info, but I already fallback to cpu so that shouldn't be a problem.

the only way for me to fix this issue is if someone gives me access to a windows machine with AMD or at the very least we could try a discord screensharing with someone who knows how to use a text editor and at least a little python.

MythicalChu commented 10 months ago

Echo'ing the "I'm on AMD, same issue right after updating IpAdapter_plus, bypassing it works fine (no error messages, just a hang on step 0)"

lord-lethris commented 10 months ago

rolling back to c28a044 temporarily fixed it for me.

Indecently - AMD are getting ready to ROCm Windows (just saw it on AMDs homepage) - So hopefully no more DirectML restrictions on windows platform and AMD based AI should fly like nVidia..!

patientx commented 10 months ago

Here is weird one for you guys :

I updated all extensions as usual with manager and before rolling back to c28.. I decided to try once again, behold it worked with the workflow I have at the time.

And then I tried to build a default workflow with a clean sheet and .. it didn't work ??? What I had with the other workflow was I was experimenting with kohya's hires fix AND hypertile (to be able to generate images at higher quality somehow faster)

Weird thing 1: I can't generate bigger than 512x960 (I normally generate at 512x768 or 512x512, with kohya I am able to twice that very easily but with this workflow it doesn't generate when I enter for example 768x960 ,512x1024 , or 1024x1536 (two times) LIKE in a1111 it is like my memory isn't enough to generate higher than that)

Weird thing 2: BOTH have to be used , if I disable one , the usual happens, no generation at start.

Weird thing 3: Kohya doesn't seem be working at all because there is no speed change in generation, but hypertile is definetly working if I change the tile size speed and end quality changes , (128 and low lowers the quality)

So this combo somehow lets me use ipadapter again.

Maybe someone more informed could get something out of this situation for solving the problem.

weird workload that enables ipadapter on amd.json

TLDR ; when we use both kohya's hires fix and hypertile , something happens and ipadapter latest version works with amd

jtary commented 10 months ago

rolling back to c28a044 temporarily fixed it for me.

Indecently - AMD are getting ready to ROCm Windows (just saw it on AMDs homepage) - So hopefully no more DirectML restrictions on windows platform and AMD based AI should fly like nVidia..!

Unfortunately MIOpen (which is the part of ROCm that pyTorch depends on) is further behind than ROCm in general. There is active work going on, but I think we'll be lucky to get a ROCm enable windows pytorch before 2025.

cubiq commented 10 months ago

can any of you try the latest update?

patientx commented 10 months ago

can any of you try the latest update?

This error comes now :

Requested to load CLIPVisionModelProjection
Loading 1 new model
ERROR:root:!!! Exception during processing !!!
ERROR:root:Traceback (most recent call last):
  File "D:\ComfyUI\execution.py", line 153, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
  File "D:\ComfyUI\execution.py", line 83, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
  File "D:\ComfyUI\execution.py", line 76, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
  File "D:\ComfyUI\custom_nodes\ComfyUI_IPAdapter_plus\IPAdapterPlus.py", line 464, in apply_ipadapter
    image_prompt_embeds, uncond_image_prompt_embeds = self.ipadapter.get_image_embeds(clip_embed.to(self.device, dtype=self.embeds_dtype), clip_embed_zeroed.to(self.device, dtype=self.embeds_dtype))
  File "D:\ComfyUI\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\ComfyUI\custom_nodes\ComfyUI_IPAdapter_plus\IPAdapterPlus.py", line 200, in get_image_embeds
    image_prompt_embeds = self.image_proj_model(clip_embed)
  File "D:\ComfyUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\ComfyUI\custom_nodes\ComfyUI_IPAdapter_plus\IPAdapterPlus.py", line 41, in forward
    clip_extra_context_tokens = self.proj(image_embeds)
  File "D:\ComfyUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\ComfyUI\venv\lib\site-packages\torch\nn\modules\container.py", line 217, in forward
    input = module(input)
  File "D:\ComfyUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\ComfyUI\venv\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 must have the same dtype

Even with newly built, just basic workflow. Previous versions at least worked with the trick I mentioned above. (if I use kohya hires fix + hypertile ; ipadapter works. I explained it a few posts up)

cubiq commented 10 months ago

what about now?

patientx commented 10 months ago

what about now?

That mat1 .. error disappeared , same problem as before, but still working with my method at least

jtary commented 10 months ago

Just gave it a try with https://github.com/cubiq/ComfyUI_IPAdapter_plus/commit/cb1c5e8a1af794a605092da78218ddaa06615235, seeing the same issue where the IP Adapter step runs fine, but when it gets to the sampler it just get stuck at 0%.

Requested to load BaseModel
Loading 1 new model
  0%|                                                                                           | 0/30 [00:00<?, ?it/s]

JarekDerp commented 10 months ago

I run some tests, edited the py script and I came to one conclusion - directml doesn't like the "sigma_start" and "sigma_end". Once I removed them from the script (latest commit was 2023-12-17) it started to work again:

Obviously when I change start_at and end_at in the webui then it doesn't do anything.

patientx commented 10 months ago

I run some tests, edited the py script and I came to one conclusion - directml doesn't like the "sigma_start" and "sigma_end". Once I removed them from the script (latest commit was 2023-12-17) it started to work again:

Obviously when I change start_at and end_at in the webui then it doesn't do anything.

Like all of them ? Can you share the edited file ?

cubiq commented 10 months ago

I run some tests, edited the py script and I came to one conclusion - directml doesn't like the "sigma_start" and "sigma_end". Once I removed them from the script (latest commit was 2023-12-17) it started to work again:

without the code this doesn't help much, what if you just rename the variables

JarekDerp commented 10 months ago

Okey, I pin-pointed it to two lines to edit. It's these two.

After removing them like in the screenshot then it works fine. The "start_at" and "end_at" parameters gets ignored but it generates the image and I assume it applies it at the whole generation from 0 to 100%.

cubiq commented 10 months ago

mh I believe the problem is the hardcoded value.

please print(extra_options["sigmas"][0].item()) at line 236 then before line 250 print(sigma_start, sigma_end)

cubiq / ComfyUI_IPAdapter_plus

Ipadapter problem after commit c28... (comfyui won't start generating) #109