Closed GavChap closed 5 months ago
Looks good. We'll want to add an auto checkpoint select node as well that auto detects/generates the correct config.
That way we can support any size model. In theory, a node like that isn't hard, but one issue I ran into was the pe_interpolation
factor, which is not stored in the diffusers state dict.
I think it should be possible to completely get rid of that value by dynamically generating it from the image size similar to how it's done for HunYuanDiT.
I gave it a quick test and it seems to work. Should I try to make those changes I just mentioned on the base repo so we can use this PR for the auto config node as well or should I merge this as-is?
Actually, with that last commit it does seem to fail with the diffusers weights for me since cross_attn.proj.weight
is the comfy name and not the diffusers name
Actually, with that last commit it does seem to fail with the diffusers weights for me since
cross_attn.proj.weight
is the comfy name and not the diffusers name
Yes, I accidentally made something break so I reverted it, i was trying to fix the "missing UNET message" but that doesn't matter as long as the correct layers exist.
Looks good. We'll want to add an auto checkpoint select node as well that auto detects/generates the correct config.
That way we can support any size model. In theory, a node like that isn't hard, but one issue I ran into was the
pe_interpolation
factor, which is not stored in the diffusers state dict.I think it should be possible to completely get rid of that value by dynamically generating it from the image size similar to how it's done for HunYuanDiT.
I gave it a quick test and it seems to work. Should I try to make those changes I just mentioned on the base repo so we can use this PR for the auto config node as well or should I merge this as-is?
I think merge it as is, then we could work on a detection node, I've been trying to figure out pe_interpolation as that should allow inference at any size, I had it working on square! I could gen 2048x2048 from the 1024 model, but as soon as you selected an aspect ratio it went off the wall.
I've closed it as there are issues I just ran into. I'll reopen it when I've made sure I fix them
Fair lol, take your time. I'll check on the PE factor stuff, see how hard it is to guess. I assume just doing an average for width+height and then taking the ratio for the base (512?) didn't help?
Fair lol, take your time. I'll check on the PE factor stuff, see how hard it is to guess. I assume just doing an average for width+height and then taking the ratio for the base (512?) didn't help?
Nope, didn't help at all. But give it a go and you'll see
I could gen 2048x2048 from the 1024 model, but as soon as you selected an aspect ratio it went off the wall.
Doing that seems like it shouldn't work, unless you were doing it the other way around. DiT is notoriously bad at resolutions it wasn't trained on.
Also, I'm able to guess the factor with the formula (x.shape[-1]+x.shape[-2])/2.0 / (512/8.0)
[PE scale computed: 2.0625 [vs:2]
]
I'm thinking maybe a soft-rounding for values that are close to whole integers could work, then leave it up to luck for values outside that lol. Not like the model works outside those anyway.
Pushed an auto checkpoint loader but it needs better logic to get the right config for diffusers, which is missing a bunch of keys that the default one has. I can go into more detail if this is something you'd like to look into. https://github.com/city96/ComfyUI_ExtraModels/commit/de52d3aa45c55958ba138165f7419ca1689edc13
I've made more changes, I'm not sure the autodetect node works with 900M models, I'll do some more investigation, I kept getting problems with gens by using the autoconfig + the new layer code to allow for more depth, the only combo I found that works is forcing the depth in the loader.py. I will keep digging since there is renewed interest in PixArt after SD3's launch.
In further testing the simple loader is working fine. It was something to do with my setup. I've removed the model_conf override and added a config to the standard loader now so this should be good to merge?
Well, I can confirm that it "works", though it looks like it definitely needs more training lol. Still, good job on this! Thanks!
Get the depth of the model by counting layers to allow for models with more depth. This allows for deeper models to be created. Example: https://huggingface.co/ptx0/pixart-reality-mix which is a 900M model