[Thread] Descriptions, findings, experimentation.

Corshi commented 3 months ago

So I have been playing with this plugin a lot lately. At the beginning I was actually disappointed, but now I am amazed by all the stability options and customization.

I am doing notes about various options this plugin provides, and effect on the merges.

One things I do not understand, is why when merging model A and B with all settings at 0, it does not gives the model B. The output images are always closer to model A, with it's composition and style. When I was doing block merges in auto1111's webui, the "borders" of mixing ware clear. Here, it feels like you always are somewhere closer to the middle in between two mods. This seems to be a problem with other block merging plugins for ComfyUI, so I guess the core calculation somewhere is off...

BUT! that being said. With Gradient operations node, it is possible to make merging values bellow 0 and higher than 1. WHAT I SEE AS A VERY AWESOME THING!!! Toy for advanced users, to make very strong contrast of styles when mixing models. And I can say that models done with values from -0.5 to 1.5 are still stable (and probably higher than that should too.)

So now I will provide my mergin ConfyUI worksflow, that has notes to some of the nodes (and txt with just notes alone) If author of this plugin could confirm or deny some of those observations, it would be awesome, as I would like to take the most of this awesome tool.

MergingBlock.json DARE_Merging_notes.txt

There are also other things. What would be the recommended layer to view when compering visualizations for models? I tried to make a visual report on:

cond_stage_model.transformer.text_model.embeddings.position_embedding.weight but it seems that the naming scheme is not the same as shown with https://github.com/lutzroeder/netron

PS: The big block of merging in the workflow... I use 3 of them to merge two models. I normally makes something of "style extraction" and then merge them. But I remove them form this instance, to make it simpler.

Corshi commented 3 months ago

Actually, Gradient Reporting generates details whatever it is set to.

54rt1n commented 3 months ago

I'll have to dust off my SD brain here... So, the weights not working correctly is a strange one, since if you are using comfy merging it should pass it in at full strength (https://github.com/54rt1n/ComfyUI-DareMerge/blob/master/components/dare.py#L184). I remember spending a good deal of time tinkering with this due to perhaps the unexpected results you are seeing as well.

norm - 2nd half of the block (unification and implementation) attention - first part of the block (noise interpretation) ff_net - middle part of the block (prompt/clip comparison)

I think you are close here - the way I see it, the attention 'looks up' the concept in the universe of image facts (the what/where of composition); and then the ff_net determines how that should be displayed in the image.

Doc has it 0.90, but real recommended for SD is 0.10. = we just want to remove the garbage data, to keep it stable, and have the mid data to, to blend etc. (up to needs)

Since it's just doing random selection it's not removing any particular data, just some random percentage. I have mixed feelings about how you should use these hyperparameters - my original method was to sample a small number of parameters and patch them in at 0.95-0.99 weight, and it was pretty good at bringing in distinctive features of a model.

This was the case at least under SD1.5. On denser models that have subsequently come out, my testing shows that DARE is much less effective than it had been; since the 'mesh' of weights is more dependent on each other and perturbation of the matrix introduces too much entropy; so your method may be better for SDXL, et. al.

I think all your observations are good though. Thanks for using my tool!

Corshi commented 3 months ago

Thanks for the answer, it does help to understand some thing. Good luck with next projects!

Corshi commented 3 months ago

I have more questions now. model_mask prevents the areas from being removed from drop_rate, or just from being overwritten from the other model, or both? model_mask can be set fro any of the models, but in the end it can be used only at one model at a time, right? Or it is always set for model_b. It records of layer names to protect. So even if it,s used to on model_a, it will records the names alone. AND convention for layer names is standardized among SD release, it will have a reverse effect of NOT protecting the data I wanted. If so, why no option to do that for both models? Now if I understood that correctly gradient is doing "merge weights for layers", but like model_mask, it only focuses on one model (model_a). And the model's b weights are scaled automatically. right? ...again, , why no option to do that for both models? Logically it seems like this approach makes a lot more sense, but Checkpoint merge in https://github.com/sipherxyz/comfyui-art-venture does exactly that. Lets you set weights for both models separately, with range of -1 to 1. And it does provide an unique way to extract difference from models. It does lets users to break things more often, but once you know what you are doing, it does add a bit of flexibility. Does rescale for CLIP involves fixing the values for embeddings.position_ids to be INT again, instead of float values?

Not pushing you to add/change things. Thanks for all your work! I am being just curious about things.

54rt1n / ComfyUI-DareMerge

[Thread] Descriptions, findings, experimentation. #12