[Feature Request] Control model offsets and some others

lllyasviel commented 1 year ago

Hi great plugin. I played with it a bit and the memory optimization is really good. Some considerations should be put with higher priority:

The requirements for some mode like the segmentation is broken.
We really need a button to offset the weight inside the controlnet: that will immediately solve the distorted faces in pose mode and distorted edges in canny mode. This can be done by offsetting the weight copy inside the controlnet with user models and sd15. Right now all controls are targeted to SD15. These should be retargeted to user models.

lllyasviel commented 1 year ago

or at least we should put some control_any3_X.pth files for anime models. I cannot release these officially because of many considerations. But this can be done by 3rd projects. But a button to directly compute it from user model is better.

lllyasviel commented 1 year ago

the formulation is

Any3.control_model.weights 
= SD15.control_model.weights + Any3.model.diffusion_model.weights - SD15.model.diffusion_model.weights

lllyasviel commented 1 year ago

or we can directly store

SD15.control_model.weights - SD15.model.diffusion_model.weights

and any time user load the model, it add the base weight from user model

lllyasviel commented 1 year ago

Updated https://github.com/lllyasviel/ControlNet/discussions/12

lllyasviel commented 1 year ago

This should be considered early since it seems that this plugin is super popular in both english and asian community. so the early this is fixed, the less things we will need to ask people to try download new files.

Mikubill commented 1 year ago

Thanks for pointing out. Will implement some fix immediately

Mikubill commented 1 year ago

Fixed in https://github.com/Mikubill/sd-webui-controlnet/commit/b9efb602e5c6851e86d72d302e9592ddc0230d53, should work but still need some tests.

lllyasviel commented 1 year ago

seems broken, not working anymore

catboxanon commented 1 year ago

Seems to be working for me with this most recent change. I got Offset cloned: 298 values in my console.

lllyasviel commented 1 year ago

Great. Now working. But how is the current offset computed? I do not even have sd15 in my webui.

catboxanon commented 1 year ago

This is what it looks like when unchecking Apply transfer control when loading models in settings by the way, just to confirm there is a difference. Same seed.

lllyasviel commented 1 year ago

non-transfer results looks even better? perhaps because any3 is bad at drawing house

lllyasviel commented 1 year ago

by the way, what is the best practice to develop webui extension? open private github repo? directly write code in webui folder?

Mikubill commented 1 year ago

Hmm, looks like transferring control brings some instability. Temporary disabled and could be re-enable in the settings.

brunogcar commented 1 year ago

by the way, what is the best practice to develop webui extension? open private github repo? directly write code in webui folder?

https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Developing-extensions

https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Extensions

kohya-ss commented 1 year ago

Hi, thank you for great work!

or we can directly store

SD15.control_model.weights - SD15.model.diffusion_model.weights

and any time user load the model, it add the base weight from user model

I implemented control transfer with this approach. The difference is calculated in advance and stored to the file. The implementation is here:

https://github.com/kohya-ss/sd-webui-controlnet-lora/tree/support-lora

extract_controlnet_diff.py makes the difference and save the state_dic with key differece as a marker, and cldm.py handles it on the fly.

I can make a pull request, and I also think you can easily copy the codes from there. Please feel free to modify it.

(Please forget the name of the repo. I intended to support ControlNet with LoRA...)

lllyasviel commented 1 year ago

I am working on these. But I have also found that in some cases, non-transfered models even have better performance. Very weird. I'm trying to understand. We still know too little about neural networks.

Mikubill commented 1 year ago

I can make a pull request, and I also think you can easily copy the codes from there. Please feel free to modify it.

Feel free to make a PR.

kohya-ss commented 1 year ago

I've made the PR #80 :)

Mikubill commented 1 year ago

Merged. This method seems to be more correctly and effective. btw, any samples?

kohya-ss commented 1 year ago

Thank you for merging!

As lllyasviel mentioned above, non-transferred models seem to have better or almost same performance sometimes. In my test, openpose seems to generate almost same image.

But canny and scribble are slightly better with transfer (little crisp image in canny, improved background in scribble). I'm using ACertainty for test.

Please let me know if you need image files or more samples.

Canny: canny

Without transfer: canny-1-masterpiece, best quality, 1girl in kimono, upper body, looking at viewer, in forest, flowers

With transfer: canny-1-masterpiece, best quality, 1girl in kimono, upper body, looking at viewer, in forest, flowers

Scribble:

Without transfer: scribble-1-masterpiece, best quality, 1boy in business suit, kung-fu pose in street

With transfer: scribble-1-masterpiece, best quality, 1boy in business suit, kung-fu pose in street

lllyasviel commented 1 year ago

perhaps those anime models are trained too much in anime domain and forgets many general object context concepts, and controlnet without transferring accidentally brought those general concepts back.

kohya-ss commented 1 year ago

I've uploaded pre-made difference files. https://huggingface.co/kohya-ss/ControlNet-diff-modules/tree/main

I have checked each file by generating an image with the extension, but would be happy to check just to be sure.

ljleb commented 1 year ago

Should the code be tweaked to accept difference models along with (or instead of) full control checkpoints? Would save some disk space for some people, lighter to move around/download etc.

I'm just not sure how to distinguish diff models vs complete checkpoints programatically.

Mikubill commented 1 year ago

Difference models should be loaded without any issues. Feel free to open new issue if not works

CCRcmcpe commented 1 year ago

IMO, doing a weights merge will not save the bad generation quality, possibly will perform even worse. This is like saying human + w * (human - ape) = more intelligent human. Train a proper model based on any3 (etc.) is the way to go.

Sansui233 commented 1 year ago

Thanks for your great work! I tested my sketch on pre-made extracted canny models with Anything-v3-fp16. Weird that the non-diff models always works much better on details and more like "Anything's style". I don't understand much about that, just post the feedback.

Original sketch:

ControlNet: xyz_grid-0000-577891555-masterpiece, best quality, illustration,face, right hand, atomespheric, cold, sunshine, sky,, high detail, flowers and leafs

paulo-coronado commented 1 year ago

Hello @Mikubill, @lllyasviel or @kohya-ss. Could you pls clarify me one thing? In this project are you replacing ControlNet base model with the user model (e.g. AnythingV3), or are you doing some other operation? I don't understand how Transfer Control happens "on the fly" without the need to generate a new merged model (e.g. AnythingV3ControlCanny.pth).

Could you please explain how "cldm.py handles it on the fly"? How is the user model merged with the SDControl model? Thanks!

haofanwang commented 1 year ago

@plcpinho Check my reply here. on the fly never means it doesn't merging, it merges in time once the model updated without saving the merged model locally.

tavihalperin commented 1 year ago

@haofanwang Thanks for the clarifications on this! So the the A1111 "on the fly" should match the output of the "statically merged" model right? why do the results look better in this approach?

catboxanon commented 1 year ago

IMO, doing a weights merge will not save the bad generation quality, possibly will perform even worse. This is like saying human + w * (human - ape) = more intelligent human. Train a proper model based on any3 (etc.) is the way to go.

@CCRcmcpe I think I understand what you're saying, but it's been shown in other cases doing this type of merge as described in https://github.com/Mikubill/sd-webui-controlnet/issues/73#issuecomment-1431836942 drastically improves results. Take a look at this: https://old.reddit.com/r/StableDiffusion/comments/zyi24j/how_to_turn_any_model_into_an_inpainting_model/

Mikubill / sd-webui-controlnet

[Feature Request] Control model offsets and some others #73