city96 / ComfyUI_ExtraModels

Support for miscellaneous image models. Currently supports: DiT, PixArt, HunYuanDiT, MiaoBi, and a few VAEs.
Apache License 2.0
394 stars 35 forks source link

Get depth programatically #65

Closed GavChap closed 5 months ago

GavChap commented 5 months ago

Get the depth of the model by counting layers to allow for models with more depth. This allows for deeper models to be created. Example: https://huggingface.co/ptx0/pixart-reality-mix which is a 900M model

city96 commented 5 months ago

Looks good. We'll want to add an auto checkpoint select node as well that auto detects/generates the correct config.

That way we can support any size model. In theory, a node like that isn't hard, but one issue I ran into was the pe_interpolation factor, which is not stored in the diffusers state dict.

I think it should be possible to completely get rid of that value by dynamically generating it from the image size similar to how it's done for HunYuanDiT.

I gave it a quick test and it seems to work. Should I try to make those changes I just mentioned on the base repo so we can use this PR for the auto config node as well or should I merge this as-is?

city96 commented 5 months ago

Actually, with that last commit it does seem to fail with the diffusers weights for me since cross_attn.proj.weight is the comfy name and not the diffusers name

GavChap commented 5 months ago

Actually, with that last commit it does seem to fail with the diffusers weights for me since cross_attn.proj.weight is the comfy name and not the diffusers name

Yes, I accidentally made something break so I reverted it, i was trying to fix the "missing UNET message" but that doesn't matter as long as the correct layers exist.

GavChap commented 5 months ago

Looks good. We'll want to add an auto checkpoint select node as well that auto detects/generates the correct config.

That way we can support any size model. In theory, a node like that isn't hard, but one issue I ran into was the pe_interpolation factor, which is not stored in the diffusers state dict.

I think it should be possible to completely get rid of that value by dynamically generating it from the image size similar to how it's done for HunYuanDiT.

I gave it a quick test and it seems to work. Should I try to make those changes I just mentioned on the base repo so we can use this PR for the auto config node as well or should I merge this as-is?

I think merge it as is, then we could work on a detection node, I've been trying to figure out pe_interpolation as that should allow inference at any size, I had it working on square! I could gen 2048x2048 from the 1024 model, but as soon as you selected an aspect ratio it went off the wall.

GavChap commented 5 months ago

I've closed it as there are issues I just ran into. I'll reopen it when I've made sure I fix them

city96 commented 5 months ago

Fair lol, take your time. I'll check on the PE factor stuff, see how hard it is to guess. I assume just doing an average for width+height and then taking the ratio for the base (512?) didn't help?

GavChap commented 5 months ago

Fair lol, take your time. I'll check on the PE factor stuff, see how hard it is to guess. I assume just doing an average for width+height and then taking the ratio for the base (512?) didn't help?

Nope, didn't help at all. But give it a go and you'll see

city96 commented 5 months ago

I could gen 2048x2048 from the 1024 model, but as soon as you selected an aspect ratio it went off the wall.

Doing that seems like it shouldn't work, unless you were doing it the other way around. DiT is notoriously bad at resolutions it wasn't trained on.

Also, I'm able to guess the factor with the formula (x.shape[-1]+x.shape[-2])/2.0 / (512/8.0) [PE scale computed: 2.0625 [vs:2]]

I'm thinking maybe a soft-rounding for values that are close to whole integers could work, then leave it up to luck for values outside that lol. Not like the model works outside those anyway.

image

city96 commented 5 months ago

Pushed an auto checkpoint loader but it needs better logic to get the right config for diffusers, which is missing a bunch of keys that the default one has. I can go into more detail if this is something you'd like to look into. https://github.com/city96/ComfyUI_ExtraModels/commit/de52d3aa45c55958ba138165f7419ca1689edc13

image

GavChap commented 5 months ago

I've made more changes, I'm not sure the autodetect node works with 900M models, I'll do some more investigation, I kept getting problems with gens by using the autoconfig + the new layer code to allow for more depth, the only combo I found that works is forcing the depth in the loader.py. I will keep digging since there is renewed interest in PixArt after SD3's launch.

GavChap commented 5 months ago

In further testing the simple loader is working fine. It was something to do with my setup. I've removed the model_conf override and added a config to the standard loader now so this should be good to merge?

city96 commented 5 months ago

Well, I can confirm that it "works", though it looks like it definitely needs more training lol. Still, good job on this! Thanks!

image