cubiq / ComfyUI_IPAdapter_plus

GNU General Public License v3.0
4.01k stars 307 forks source link

Ipadapter problem after commit c28... (comfyui won't start generating) #109

Closed patientx closed 7 months ago

patientx commented 10 months ago

PC : windows 10, 16 gb ddr4-3000, rx 6600, using directml with no additional command parameters.

Updated today with manager , and tried my usual workflow which has ipadapter included for faces, when it comes to actually generating the image , it just stops there, nothing happens. I have to close the command prompt and relaunch the app.

Loading 1 new model 0%| | 0/8 [00:00<?, ?it/s]

I was mainly using lcm so tried normal sd1.5 models , the same results. Tried different samplers , same. If I get ipadapter out of the workflow everything works correctly. So I am not updating daily , thought something in the middle changed, tried every commit and found out that it works up until "timestepping, fix" , after that the problem is here. So now I am on c28a04466b17d760a345aea41d6a593c0a312c95. (last one before that fix) . Everything works as it was before.

JarekDerp commented 10 months ago

I edited it like this: image

And then I got this:

Loading 1 new model
  0%|                                                                                            | 0/8 [00:00<?, ?it/s]
extra_options[sigmas][0].item()

And then nothing. CPU and GPU at 0%. Looks like it freezes trying to read what's in extra_options["sigmas"][0].item()

cubiq commented 10 months ago

just print(extra_options["sigmas"])

JarekDerp commented 10 months ago

Here's what I got:

sigmas
tensor([14.6146], device='privateuseone:0')
sigmas[0].float()
tensor(14.6146, device='privateuseone:0')
sigmas.item()
14.614642143249512
sigmas[0].item()
14.614642143249512

Well, looks like it works after all?

JarekDerp commented 10 months ago

I made edits like this: image

and it didn't work. It was stopping at line 237 without any explanation or error messages. Tried to remove the [0] and it didn't work either. When I changed it to: image

It started to spit out some numbers:

extra_options[sigmas]
tensor([14.6146], device='privateuseone:0')
if 'sigmas' in extra_options
Sigma in extra_options
999999999.9 0.0

Very strange problem.

cubiq commented 10 months ago

if you can join Discord we can try to fix it (I would be available in about 1:30h)

JarekDerp commented 10 months ago

Where can I find the discord invite link?

I feel like I'm getting closer. If only I knew what I'm doing. I changed the code to something like this: image

And now it sometimes work and sometimes not. On the preview in KSampler I can see that it sometimes gets to step 5 just fine, but then starts to show a black image. Sometimes it even manages to get to step 8, but most often I get a black image from the beginning.

Here's the log file, outputting the sigmas and the result of the if statement log.txt

cubiq commented 10 months ago

https://latent.vision/discord

JarekDerp commented 10 months ago

Ok, I got it to work. I don't know why, but it works and it even respects the start_at and end_at parameters correctly. The black image was just some issue with my ComfyUI. Looks like the problem was some strange behaviour when trying to do extra_options["sigmas"][0].item(). When I split it into two part it works fine.

         org_dtype = n.dtype
         cond_or_uncond = extra_options["cond_or_uncond"]
-        sigma = extra_options["sigmas"][0].item() if 'sigmas' in extra_options else 999999999.9
-
+        temp_sigma = extra_options["sigmas"] if 'sigmas' in extra_options else null
+        sigma = temp_sigma[0].item() if temp_sigma else 999999999.9
+        
         # extra options for AnimateDiff
         ad_params = extra_options['ad_params'] if "ad_params" in extra_options else None
cubiq commented 10 months ago

I would suggest to try this

        sigma = extra_options["sigmas"] if 'sigmas' in extra_options else None
        sigma = sigma[0].item() if sigma is not None else 999999999.9
JarekDerp commented 10 months ago

I would suggest to try this

        sigma = extra_options["sigmas"] if 'sigmas' in extra_options else None
        sigma = sigma[0].item() if sigma is not None else 999999999.9

This doesn't work, unfortunately. The same as before, just freezes when it gets to KSampler.

cubiq commented 10 months ago

doesn't make any sense

JarekDerp commented 10 months ago

I know. It doesn't make sense at all, even though I'm not familiar with Python. Well, at least this one works for some reason:

        PointlessVariable = extra_options["sigmas"][0]
        sigma = PointlessVariable.item() if PointlessVariable else 999999999.9

Or this one works too:

        sigma = extra_options["sigmas"][0]
        sigma = sigma.item() if sigma else 999999999.9

As if it likes bad syntax. When I was writing scripts in PowerShell I remember I used to have an issue where a command in line 89 got executed before the command in line 88. So for example code like this:

88  print("ABC")
89  print("abc")

got executed like this in the log:

abc
ABC

I get a similar vibe here.

JarekDerp commented 10 months ago

I think this needs to work like I'm the example here: https://pytorch.org/docs/stable/generated/torch.Tensor.item.html In the example they assigned a tensor to a variable, and then used the ".item()".

I think it works in a way so that it moves the information from GPU to CPU to be used in the calculation later and it cannot be used from GPU directly when using Directml. That's why it works when assigned to a variable first. That would also explain why it works fine when I run Comfyui on my CPU.

cubiq commented 10 months ago
sigma_fuuu_directml = extra_options["sigmas"] if 'sigmas' in extra_options else None
sigma = sigma_fuuu_directml[0].item() if sigma_fuuu_directml is not None else 999999999.9

so would this work?

JarekDerp commented 10 months ago

Nope, freezes. I tried many different things. I even tried "try except" like this:

        try:
            x = extra_options["sigmas"][0]
            sigma = x.item()
        except:
            sigma = 999999999.9
        print(sigma)

And nothing, still freezes while trying to read extra_options["sigmas"][0].item(). Looks like this is a problem with DirectML like comfyanonymous said. If I do just extra_options["sigmas"][0], then the comfy doesn't freeze and it continues, with black image, but at least it continues. And freezes as soon as it hits ".item()".

Here's example. Running this: image

I got info like this and it froze: image

I don't know why it didn't freeze in my previous examples. I tried looking into ttn nodes and efficiency nodes looking into how they've done it but I don't understand anything. I'm not good with Python as you might have guessed.

jtary commented 10 months ago

There seems to be some sort of race-condition going on. I can get the adapter to run, just by adding some prints:

+        print("Getting ready!")
+        print(f"Okay, about to get the item for {extra_options['sigmas']}")
         sigma = extra_options["sigmas"][0].item() if 'sigmas' in extra_options else 999999999.9
+        print(f"sigma: {sigma}")

I think extra_options['sigmas'] needs to be accessed to set some internal state. Without that state, trying to reference [0] causes the call the block. Some kind of lazy evaluation maybe?

JarekDerp commented 10 months ago

I also noticed that adding some prints unblocks it but if I unblock it with prints then I get black images and comfy complains about some img decoding script problems (error message at the end of the log file I attached to one of my messages). Oh well, I give up. I'm really considering just abandoning image generation altogether until AMD Devs finally starts supporting ROCm on Windows. One workaround after another just to generate a handful of decent images.

jtary commented 10 months ago

Yeah, it's frustrating. I really don't want to support nVidia, but both AMD and Microsoft seem to be really dropping the ball on supporting their platforms. torch-directml is woefully out of date, and while ROCm with Pytorch on Windows is moving forward, I'm doubtful it will have support for older hardware.

I've noticed the black-square issue too. Interestingly, it seems to happen around step 7 or so. I'm looking to see if I can track down the cause.

JarekDerp commented 10 months ago

For me the black image starts from step 1, but sometimes starts in the middle or even finishes without any problems. I tried Microsoft Olive (onnx) recently and it works quite well. I get about 400%-500% better speeds on the models I optimized - the same speed as normal safetensor version of a given model when running AMD+Linux. I could generate 1024*1024 image in less than 7 seconds on both Olive and Linux on my 6700XT. But unfortunately there's a lot of things you cannot do with Olive at the moment and I hate using Linux.

jtary commented 10 months ago

Ok, seems to be another "you must access this variable before using" bug. You can work around it using this patch to comfy:

--- a/comfy/ldm/modules/attention.py
+++ b/comfy/ldm/modules/attention.py
@@ -523,6 +523,10 @@ class BasicTransformerBlock(nn.Module):
                 if value_attn2 is None:
                     value_attn2 = context_attn2
                 n = self.attn2.to_q(n)
+
+                if n.isnan().any():
+                    raise ValueError("n is nan")
+
                 context_attn2 = self.attn2.to_k(context_attn2)
                 value_attn2 = self.attn2.to_v(value_attn2)
                 n = attn2_replace_patch[block_attn2](n, context_attn2, value_attn2, extra_options)
cubiq commented 10 months ago

this happily falls in "won't fix". DirectML is a joke. If you hate nvidia and want to do ML with AMD, install Linux. It's harsh but true.

jtary commented 10 months ago

Understandable. I can't think of a more 'hacky' fix, than doing an error check to force a lazy-evaluation to resolve because there is a bug that was probably fixed 6 months ago, but direct-ml can't get their shit together. I guess if any other windows/amd folks come around asking about it, this issue serves as documentation of how you can hack a fix in yourself.

Thanks @cubiq for trying to troubleshoot it.

cubiq commented 10 months ago

I could bypass the whole timestepping if I detect dml but I have no way to check if it will ever work, so I dunno. Need to think how to do it properly.

jtary commented 10 months ago

Even if you fix your issue, there is still the black-box NaN issue from Comfy. The real bug is in Torch, and was probably fixed 10 releases ago. Microsoft really needs to release a new version of torch-directml, so that's compatible with the most recent Torch versions. It's not like they don't have the resources to throw a few developers at it.

patientx commented 10 months ago

I understand the decision it is entirely on ms and amd at this point for various problems regarding directml (not only this) but would there be any problems (with the extension or comfyui in any other way) if I continue to use JarekDep's temp solution here , working for me with the latest version this way. https://github.com/cubiq/ComfyUI_IPAdapter_plus/issues/109#issuecomment-1862932994 . Thanks for actually trying to solve this. Hope amd gets their sh*t together.

Keep up the good work !

lord-lethris commented 9 months ago

I had this very problem today after updating ComfyUI and my Nodes.

The Temp Solution in https://github.com/cubiq/ComfyUI_IPAdapter_plus/issues/109#issuecomment-1862932994 also fixed it for me.

aa022 commented 9 months ago

had the same problem, https://github.com/cubiq/ComfyUI_IPAdapter_plus/issues/109#issuecomment-1862932994 seems to work, tysm guys

JarekDerp commented 9 months ago

It's a shame we're not able to fix it for Directml. Over the past few days I've played around with the timestepping running it on the CPU and it's so good for the image quality. You have so much more control regarding how the image will look like. It would be great if someone's good with programming, Directml know-how and AMD card would have a look at it.

gkiryaziev commented 8 months ago

I have the same issue with the RX 6600 and DirectML on Windows. This trick worked for me too https://github.com/cubiq/ComfyUI_IPAdapter_plus/issues/109#issuecomment-1863108998. I'm at the latest commit https://github.com/cubiq/ComfyUI_IPAdapter_plus/commit/46241f3ba5401f076f8d90c2aa85f2194910e1a9 .

sigma = extra_options["sigmas"][0] if 'sigmas' in extra_options else None sigma = sigma.item() if sigma else 999999999.9

gkiryaziev commented 8 months ago

You can make these changes in the code, it won't change the logic, but it will solve the problem with AMD GPU and DirectML.

2024-02-02_214400

cubiq commented 8 months ago

does it work now?

gkiryaziev commented 8 months ago

No, it doesn't work. It's stuck at 0%

1 2

But this is how it works. I removed "is not None" in second line

3 4 5

Here's the result

6

cubiq commented 8 months ago

doesn't make any sense but if that's what it takes...

cubiq commented 8 months ago

actually I have to check if it may erroneously validate with just if sigma

lord-lethris commented 8 months ago

You can make these changes in the code, it won't change the logic, but it will solve the problem with AMD GPU and DirectML.

2024-02-02_214400

This also fixed it for me after another update.

cubiq commented 8 months ago

it's all good but "if sigma" is ambiguous and validates in a plethora of situations. Have to check if it doesn't break anything. I'd rather just assign a value like -1 or maybe check the type

JarekDerp commented 8 months ago

What exactly is this sigma? And in what instances it's "None" that we need to check? Isn't it just information about a tensor's state that's currently working on diffusing the image in the current interference step? I would assume that it's always there, and the more steps it already finished, the closer it gets to value 0 (0 = no interference left in the image) - at least that's how I understand it.

So by doing

sigma = extra_options["sigmas"][0] if 'sigmas' in extra_options else None

sigma = sigma.item() if sigma else 999999999.9

First you are checking if there's a property "Sigmas" in extra_options and load the first instance of the sigma. If there's no Sigmas, then you assign it value "None". Then second step just checks if it's "None" (from the previous if statement) or not ("None" or 0 = false; any other value will return true).

In summary, those two ifs has those outcomes: No Sigma = assign 9999999.9 (telling the diffusor that it's far from finished and needs to diffuse the image) Sigma exists = assign the value of the first sigma found

Isn't that right?

JarekDerp commented 8 months ago

Ah, I just realized that there's a potential problem with it as I was re-reading my message. As sigma approaches 0 in the last steps, it might be that the last couple steps will have sigma value of 0.888 or 0.321 or something like that. I'm not sure if python will round it down to 0 and on the "if sigmas" it will return "False" and apply the 99999.9 - meaning the "end_at" value might be ignored for the last couple steps. I'm not sure how that works in python, but in Excel and PowerShell it works fine, but I don't have python to test it: image image

gkiryaziev commented 8 months ago

So, I've added a few prints to your previous code, to the current code and to my own, without is not None.

9 10

11 12

As we can see the call function is called iteratively and stuck on the second iteration in both cases.

But without is not None the code works and reaches the end.

13 14

Maybe this will help to understand the problem.

cubiq commented 8 months ago

does it work with if isinstance(sigma, torch.Tensor)?

gkiryaziev commented 8 months ago

Unfortunately, that doesn't work either. Still stuck at 0%

1

gkiryaziev commented 8 months ago

I think this is not a type checking issue, I think this is a timing issue, a delay in device initialization, because if I specify print exactly after first line, process goes on, doesn't get stuck at 0%, but result is a black image.

Very strange problem, because torch.Tensor should pass the None check. And why the program changes its behavior just after adding one print operator.

1 2 3 4

sanchitwadehra commented 8 months ago

ok so currently if anyone is facing this issue still, here is the solution that works with latest comfyui - Fix gligen lowvram mode. commits thanks to (JerekDerp) :-

Problem: The original code attempted to access the first element of extra_options["sigmas"] and convert it to a Python number in a single line. However, if extra_options["sigmas"] did not exist or was None, this would raise an error because you can’t access an element of None.

Solution: The solution was to split this operation into two parts. First, we check if extra_options["sigmas"] exists and is not None. If it is, we set sigma to 999999999.9. If extra_options["sigmas"] does exist, we then access the first element and convert it to a Python number. This prevents any attempts to access an element of None, which would raise an error.

Python Code change in IPAdapterPlus.py file at the :-

sigma = extra_options["sigmas"][0] if 'sigmas' in extra_options else None sigma = sigma.item() if sigma is not None else 999999999.9

Here’s the modified part of the code:

Split the operation into two parts

temp_sigma = extra_options["sigmas"] if 'sigmas' in extra_options else None sigma = temp_sigma[0].item() if temp_sigma else 999999999.9

This modification ensures that sigma is always assigned a value, either from extra_options["sigmas"] or the default value 999999999.9, without raising an error. This makes the code more robust and prevents it from crashing due to a NoneType error when extra_options["sigmas"] does not exist or is None.

cubiq commented 7 months ago

can someone try sigma = extra_options["sigmas"].cpu()

JarekDerp commented 7 months ago

can someone try sigma = extra_options["sigmas"].cpu()

I tried this:

        cond_or_uncond = extra_options["cond_or_uncond"]

        sigma = extra_options["sigmas"].cpu()
        print(sigma)

        sigma = extra_options["sigmas"][0] if 'sigmas' in extra_options else None
        sigma = sigma.item() if sigma is not None else 999999999.9

and i got this when i run it: image

so this seems to work.

cubiq commented 7 months ago

okay then, it was pretty easy. Not sure why I haven't thought about it sooner

JarekDerp commented 7 months ago

I haven't tested it thoroughly though. It's possible that running this if statement like it was before might not work even with the .cpu().

The current solution works fine so I'm not sure if it's worth changing it.

cubiq commented 7 months ago

would you mind trying with the previous statement and .cpu()?

cubiq commented 7 months ago

Something like this could work

        sigma = extra_options["sigmas"].detach().cpu().numpy()[0] if 'sigmas' in extra_options else None
        sigma = sigma if sigma is not None else 999999999.9
MythicalChu commented 7 months ago

Something like this could work

        sigma = extra_options["sigmas"].detach().cpu().numpy()[0] if 'sigmas' in extra_options else None
        sigma = sigma if sigma is not None else 999999999.9

I confirm that the above does work