Intel support for Pixart-Sigma

a-One-Fan commented 5 months ago

This PR adds support for Arc Alchemist GPUs (e.g. A770, A580) for Pixart-Sigma.

city96 commented 5 months ago

Don't have an intel card to test on but I'm not sure including a patch like that in a custom node repo is the best idea. Based on the discussion here, there is an older version of that file that will fix it for the rest of comfy as well. Also, including the patch here would still mean the 4GB limit is present in other places (i.e. the VAE).

Also, for the attention drop, can you check if passing it as float(self.attn_drop.p) works? I think that should stop it from modifying the value. That or we can just set it to zero since no one will ever be training with this repo lol.

a-One-Fan commented 5 months ago

Don't have an intel card to test on but I'm not sure including a patch like that in a custom node repo is the best idea. Based on the discussion here, there is an older version of that file that will fix it for the rest of comfy as well. Also, including the patch here would still mean the 4GB limit is present in other places (i.e. the VAE).

I've been able to coast by in the rest of Comfy relatively fine. Yes, the VAE will also hit the 4GB limit, but I can also use the tiled VAE instead and I'd prefer to do that. I've seen that discussion, it's pretty old, I assumed it would get implemented (especially since the very first post in it directly mentioned the need for that), and since XPU support got implemented... But it hasn't happened. No idea what's up with that. I just want it fixed now, and 2048 Pixart Sigma is the main thing bugging me. If it gets fixed, revert the commit? Or, if you think that's not good, I'll poke around that thread...

Also, for the attention drop, can you check if passing it as float(self.attn_drop.p) works? I think that should stop it from modifying the value. That or we can just set it to zero since no one will ever be training with this repo lol.

Sorry, attn_drop specifically gets modified when Comfy calls ipex.optimize on the model. Maybe I was too hasty looking at the stack trace of who touched that and got mad at IPEX when I saw IPEX lines. You can pass an argument to comfy to not ipex.optimize models... I don't think that should be necessary. Could replace the ifs with getattr instead?

city96 commented 5 months ago

Ah yeah, I see what you mean with the attn_drop, was a bit too tired to see what that did when I looked at it yesterday night lol. It's fine either way, you can switch it to getattr but not like this repo has any coding conventions or anything.

Alright, I'll merge it. Just one small request: Can you move the ipex folder to utils and maybe credit this repo/user? Seems to be who originally submitted it to both sd.next and sd-scripts so it's as good a link as any to put I guess.

simonlui commented 5 months ago

I'll chime in here about some things since I did work on IPEX in ComfyUI and started getting into other models and found this.

Dropout gets "hijacked" to become Identity because the ipex.optimize call by default when ComfyUI gets called on Intel GPUs actually do that as an optimization step, see the replace_dropout_with_identity boolean parameter. The fact that it doesn't work here is most likely a bug, and ~~I will try and open a bug report in IPEX to see what upstream says about this~~ They have responded and think dropout shouldn't be used during inference and there is no way to smartly detect this so it has to be a client issue but the changes here will allow you to run it despite that. But I turned the parameter to False and that was enough to run Pixart Sigma except for the 2k model.
The split attention implementation does indeed fix and work around the 4GB allocation limit in IPEX for any one allocation but I have found it to be slower than regular SPDA which IPEX does implement currently. It probably realistically is only needed for 2k model. I have never gone above the 4GB limit in ComfyUI recently unless I stacked a ton of LORAs and went above the 1024x1024 resolution boundary in SDXL. It would be nice if there was some way to detect if a memory allocation is going to go over before switching to the split attention implementation.
IPEX will be merged upstream starting with Pytorch 2.4.0 according to discussions inside transformers https://github.com/huggingface/transformers/pull/31238 so there might be the possibility in the future to not need such a messy implementation as it is right now relegating it to another folder. I would also like to see IPEX/XPU support put in for other models supported here but understand that is somewhat outside the scope of the issue here.

Thanks for the work in this repo btw, this made it easier to play around with some other image diffusion models I wasn't able to before.

a-One-Fan commented 5 months ago

The 4GB SDPA being slower is interesting, I haven't noticed much of a performance reduction myself, I'll have to test. At least for now, maybe it's possible do detect if it's the 2K model and only apply it then. I've heard that the upstreaming was for 2.5 (e.g. https://github.com/pytorch/pytorch/issues/114723 though primarily from other people on the Intel discord I guess), I wonder why that mentions 2.4 and if it's coming sooner than I expected... Nice to see.

a-One-Fan commented 5 months ago

I tested 1024^2 translucent pig, inside is another pig, getting ~1.015it/s on average with the 4GB workaround, and ~1.265it/s without. I'd say this is a pretty significant performance difference. Any opinions on how dumb my hack is for guessing when to use the workaround?

a-One-Fan commented 4 months ago

Alright, this should be it?

city96 commented 4 months ago

Looks good to me, thanks!

city96 / ComfyUI_ExtraModels

Intel support for Pixart-Sigma #64