Closed novaexe closed 9 months ago
in addition im also very confused by the masked approach in general, as it now requires a base model to be created first? some example workflows would be fantastic.
The masking used to be hidden under the surface, and didn't work very well. The purpose is to be able to find 'significant' parameters by comparing it to a base model. For DARE-TIES merging, the mask is optional, but can be used to protect some parameters from being overwritten. All of the options that were moved to masking were only related to that operation.
hmm 🤔 so there is now no longer a way to choose? as before it was quite clear
mmmm kinda but i still dont get it :(
I added another picture. You can see how the original model image is changed a lot when you do a normal dare merge, but when you mask the original model it retains a lot more of it's original character because it protected the 'most trained' parameters in the original model by comparing it to the base model (SD 1.5).
https://github.com/54rt1n/ComfyUI-DareMerge/blob/master/examples/daremergepic.png
yea i suppose i need to play around with it more, still cant merge though quantile between just 2 merges tho?
or is that what the masked model is now for? so it requires a base?
I pushed an update and changed the wording around some. If you wanted to do just a quantile merge without doing the dare-ties sampling, you could take and create a mask from the two models you want to merge (so you can find all the changes above/below your quantile), and then just do a regular Masked Model merge. I think it makes more sense the more you play with it.
tyvm for ur time an patience, mostly im just trying to wrap my head around where dare merge fits in? thought it would be a drop in replacement of sorts for add difference initially, turns out its way more complicated than that, as before i could merge 2 models with completely different deltas and achieve something similar, looks like that is no longer the case or my brain is having trouble understanding the masked sum of 2 parts then to target instead of the way it was before with c being the protected model and ab being a chosen difference of how much more of either you would want
hmm 🤔 so there is now no longer a way to choose? as before it was quite clear
as again this was significantly easier to understand than being forced into a repeat merge even if it was being done in the backend for you
as this is now what makes the most sense to me but by ur example that is incorrect
The mask selects all of the parameters you want to include in the merge. Since we don't want to overwrite the biggest ones in the model we are merging in to; we mask our target model with 'below' at ~90%. That way those parameters we want to keep are excluded from the merge. At least, that's a good place to start.
The math behind the DARE-TIES is that you don't need to take all of the parameters in a merge, and if you only take some percentage of the likely best ones, the resulting model can have the full capabilities of both, instead of just capabilities 'in the middle' between two models. You cycle your seed until you find one that captured a part of the second model you were wanting to add.
so in your example that is taking the knowledge of sakura with a 90% and injecting it into dreamshaper, that part i get, what i don't is why protect dreamshaper instead of sakura
It drops 90% of sakura's parameters and protects dreamshaper when I don't want to overwrite the parameters that are highly significant to the original model's output. By setting my layer weights to zero, it takes 100% if the value of the second model for the small number of parameters that are copied, which can produce some amazing effects.
oooooo okay i think i understand now! right i was kinda on the right track just got confused by the reverse order compared to an add difference merge A*+(b-c)
in this way i guess it would be A+protect(c)+90%drop of b
or something like that tyvm again! xD very interesting stuff nd just a little complicated but ill take the time to learn it :D
they can no longer be performed on the sum of 2 models and loaded?