Closed a-One-Fan closed 4 months ago
Don't have an intel card to test on but I'm not sure including a patch like that in a custom node repo is the best idea. Based on the discussion here, there is an older version of that file that will fix it for the rest of comfy as well. Also, including the patch here would still mean the 4GB limit is present in other places (i.e. the VAE).
Also, for the attention drop, can you check if passing it as float(self.attn_drop.p)
works? I think that should stop it from modifying the value. That or we can just set it to zero since no one will ever be training with this repo lol.
Don't have an intel card to test on but I'm not sure including a patch like that in a custom node repo is the best idea. Based on the discussion here, there is an older version of that file that will fix it for the rest of comfy as well. Also, including the patch here would still mean the 4GB limit is present in other places (i.e. the VAE).
I've been able to coast by in the rest of Comfy relatively fine. Yes, the VAE will also hit the 4GB limit, but I can also use the tiled VAE instead and I'd prefer to do that. I've seen that discussion, it's pretty old, I assumed it would get implemented (especially since the very first post in it directly mentioned the need for that), and since XPU support got implemented... But it hasn't happened. No idea what's up with that. I just want it fixed now, and 2048 Pixart Sigma is the main thing bugging me. If it gets fixed, revert the commit? Or, if you think that's not good, I'll poke around that thread...
Also, for the attention drop, can you check if passing it as
float(self.attn_drop.p)
works? I think that should stop it from modifying the value. That or we can just set it to zero since no one will ever be training with this repo lol.
Sorry, attn_drop specifically gets modified when Comfy calls ipex.optimize on the model. Maybe I was too hasty looking at the stack trace of who touched that and got mad at IPEX when I saw IPEX lines. You can pass an argument to comfy to not ipex.optimize models... I don't think that should be necessary. Could replace the ifs with getattr instead?
Ah yeah, I see what you mean with the attn_drop, was a bit too tired to see what that did when I looked at it yesterday night lol. It's fine either way, you can switch it to getattr but not like this repo has any coding conventions or anything.
Alright, I'll merge it. Just one small request: Can you move the ipex folder to utils
and maybe credit this repo/user? Seems to be who originally submitted it to both sd.next and sd-scripts so it's as good a link as any to put I guess.
I'll chime in here about some things since I did work on IPEX in ComfyUI and started getting into other models and found this.
Thanks for the work in this repo btw, this made it easier to play around with some other image diffusion models I wasn't able to before.
The 4GB SDPA being slower is interesting, I haven't noticed much of a performance reduction myself, I'll have to test. At least for now, maybe it's possible do detect if it's the 2K model and only apply it then. I've heard that the upstreaming was for 2.5 (e.g. https://github.com/pytorch/pytorch/issues/114723 though primarily from other people on the Intel discord I guess), I wonder why that mentions 2.4 and if it's coming sooner than I expected... Nice to see.
I tested 1024^2 translucent pig, inside is another pig
, getting ~1.015it/s on average with the 4GB workaround, and ~1.265it/s without. I'd say this is a pretty significant performance difference.
Any opinions on how dumb my hack is for guessing when to use the workaround?
Alright, this should be it?
Looks good to me, thanks!
This PR adds support for Arc Alchemist GPUs (e.g. A770, A580) for Pixart-Sigma.