AUTOMATIC1111 / stable-diffusion-webui-extensions

Extension index for stable-diffusion-webui
502 stars 266 forks source link

add sd-webui-ditail to extensions #367

Closed RicercarG closed 1 month ago

RicercarG commented 2 months ago

Info

https://github.com/MAPS-research/sd-webui-ditail.git

Checklist:

w-e-w commented 2 months ago

note this is a preliminary review

issues found

  1. first when trying to use your extension and there seems to be some type mismatch issues
    
    *** Error running process: B:\GitHub\stable-diffusion-webui\extensions\sd-webui-ditail\scripts\ditail_ext.py
    Traceback (most recent call last):
      File "B:\GitHub\stable-diffusion-webui\modules\scripts.py", line 832, in process
        script.process(p, *script_args)
      File "B:\GitHub\stable-diffusion-webui\extensions\sd-webui-ditail\scripts\ditail_ext.py", line 226, in process
        setattr(dummy, k, v)
      File "B:\GitHub\stable-diffusion-webui\extensions\sd-webui-ditail\scripts\ditail_ext.py", line 145, in enable_ditail_callback
        # reset the model condition stage key
      File "B:\GitHub\stable-diffusion-webui\extensions\sd-webui-ditail\scripts\ditail_ext.py", line 276, in extract_latents
        register_time(shared.sd_model.model.diffusion_model, params.sigma[0].item())
      File "B:\GitHub\stable-diffusion-webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "B:\GitHub\stable-diffusion-webui\extensions\sd-webui-ditail\ditail\extract_features.py", line 93, in extract
        init_latent = model.get_first_stage_encoding(model.encode_first_stage(init_image))
      File "B:\GitHub\stable-diffusion-webui\modules\sd_hijack_utils.py", line 22, in <lambda>
        setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
      File "B:\GitHub\stable-diffusion-webui\modules\sd_hijack_utils.py", line 36, in __call__
        return self.__orig_func(*args, **kwargs)
      File "B:\GitHub\stable-diffusion-webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "B:\GitHub\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 830, in encode_first_stage
        return self.first_stage_model.encode(x)
      File "B:\GitHub\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\autoencoder.py", line 83, in encode
        h = self.encoder(x)
      File "B:\GitHub\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "B:\GitHub\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
      File "B:\GitHub\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 523, in forward
        hs = [self.conv_in(x)]
      File "B:\GitHub\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "B:\GitHub\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
      File "B:\GitHub\stable-diffusion-webui\extensions-builtin\Lora\networks.py", line 599, in network_Conv2d_forward
        return originals.Conv2d_forward(self, input)
      File "B:\GitHub\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\conv.py", line 460, in forward
        return self._conv_forward(input, self.weight, self.bias)
      File "B:\GitHub\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\conv.py", line 456, in _conv_forward
        return F.conv2d(input, weight, bias, self.stride,
    RuntimeError: Input type (float) and bias type (struct c10::Half) should be the same

this is running on an Nvidia GPU
I've also tested running on CPU with fill precision and it seems to work
my guess is that somewhere in the pipeline you have sets a type for a certain tensor which does not match what is used by web UI

2. if you wish to modify a setting value then you should reset the setting to what it was so that is to not affect the user
for example shared.opts.sd_vae_overrides_per_model_preferences you set it to True but, if a user have this set to false then you would inadvertently cause issues for them in the future
if possible don't change the setting, try to achieve the same effect with other means
if changing the setting is necessary then preferably change the setting via `p.override_settings`
if a setting cannot be probably applied using override then change the setting but before changing you must remember the original value and then revert the value back to the original once you're done with it

3. the enable checkbox should not toggle a value at the server side
this is a webpage, meaning of multiple persons can access the same web page at the same time, the way you implemented means that if you have multiple web page or reload web page it will be a desync between the checkbox and the back end value
all inputs should be processed in the Scripts callback during generation

4.  (related to 3.) unless there's a special reason when extension is disabled it should do "nothing"
most extensions pass the enable toggle as one of the args and have a `if not enable: retrun` logic at each callback

5. it's not a good idea to provide default image for text image
if you provide a default image then it basically means that every time even when the extension is not using the image the image is still needs to be sent back to the server which uses extra bandwidth and processing power may not be critical when running locally but when people is running on a remote server that doesn't have a good internet connection it can slow down the speed

6. no infotext / image generation parameters metadat
by default when review I generate an image it saves the generation parameters into
the metadata can be read by the UI and used to apply the same settings 
https://github.com/w-e-w/sd-webui-infotext-example
I have an example repository showing that how info tax can be implemented

severity
issue 1. 2. are critical is
3. very important
4. 5. 6. is best to have

---

potential danger about using `script_callbacks.remove_current_script_callbacks()`
this is fine if you only have one call back on the same file, but if you add new callbacks in the future you could inadvertently remove callbacks you didn't wish to remove
if you have new callbacks in the future then you want to switch to `script_callbacks.remove_callbacks_for_function(callback_func)`

---
RicercarG commented 2 months ago

Hi @w-e-w,

Thank you for your feedback!

Regarding the first issue, my development environment is on a Mac with MPS, and I had the --no-half flag enabled by default, which led to the datatype issue being missed during testing.

I’ll address all issues you mention and follow up on this PR as soon as they’re resolved.

w-e-w commented 2 months ago

my development environment is on a Mac with MPS, and I had the --no-half flag

yes that explains it, when I test on CPU, it also uses the Mac with MPS, and I had the --no-half flag and so works so I'm guessing most likely somewhere in your code hard code some tensor to use full precision

irrc you should refrence these values for types https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/82a973c04367123ae98bd9abdf80d9eda9b910e2/modules/devices.py#L125-L129

also consider switching to modules.ui_components.InputAccordion fot the Accordion of the extention image image

it's basically an enable checkbox + accordion combo custom element

from modules.ui_components import InputAccordion
with InputAccordion(False, label=DITAIL, open=False, elem_id=eid("main_accordion")) as ditail_enable:
    # ditail_enable <- this is the checkbox
    ....
RicercarG commented 2 months ago

Hi @w-e-w:

I've pushed all changes to main branch of https://github.com/MAPS-research/sd-webui-ditail.git as v0.1.1.

Changes I made according to your issues are as follows: Issue1 - type mismatch: In v0.1.0, I did hard code the input source image as fp32. Now in v0.1.1 the image tensor should be casted to match with webui dtype.

Issue2 - override settings: Since our extension temporarily 'hijacks' the StableDiffusionProcessing object and then 'restore' it before process_images function is called, p.override_settings doesn't work in our case. To not affect the user, I've implemented the code for reverting values back to the original.

Issue3 - value between UI and server not being desynced: I changed the implementation of checkbox, and used gr.State to store the value (also changed the accordion to modules.ui_components.InputAccordion). I tested with two webpage instances, and no conflicts were found.

Issue4 - logic for disabling 'ditail' extension: Along with the change for issue 3, now our extension also passes the enable toggle, and uses if not enable: return logic.

Issue5 - default image should be removed: The placeholder image is removed (goodbye our lovely cocktail).

Issue6 - no infotext: Infotext for 'ditail' extension is now shown in the png metadata. Thank you for your tutorial repo, it really saved me lots of efforts.

Misc: I've replaced script_callbacks.remove_current_script_callbacks() with script_callbacks.remove_callbacks_for_function(callback_func) as you suggested.

Thanks again for your comments and resources.

w-e-w commented 1 month ago

sorry to keep you waiting being occupied with other things recently

anyway the dtype issues still exist among infotext issues, I made a fix PR, hopefully it should be fixed this time

w-e-w commented 1 month ago

oh and I forgot to ask from my testing prompt doesn't seem to have effect not sure if I did something wrong with something

RicercarG commented 1 month ago

Hi @w-e-w:

Thanks for your contributions, and Happy Mid-autumn Festival!

I've merged your PRs to the main branch. This is my first time coding for open source projects, and I truly appreciate your patience. In our previous code I didn't pay much attention readability like following the PEP8 style guide and clearing unused imports. I’ll make sure to adhere to these standards moving forward.

For issues of text length and FP8 checkpoints, we've added them to known issues in our README. We plan to fix them along with the support for SDXL and batch count/size larger than 1. This would take some time and we plan to start working on that during the beginning of October. In the meantime, we hope the current pull request to 'stable-diffusion-webui-extensions' can be merged if this issue is not critical. To our best knowledge, DDIM inversion for SD webui hasn’t been implemented before, and we’re excited to contribute something that could be useful for the community.


oh and I forgot to ask from my testing prompt doesn't seem to have effect not sure if I did something wrong with something

One of the use cases of our extension is transferring a given image to the style of a fine-tuned SD checkpoint. Prompts are used to control for what content the model should 'pay attention' to. We recommend using prompts as a description of the image (clip interrogation works well with our extension), as well as the way of activating LoRAs . This approach retains the overall structure of the input image while allowing the model flexibility in generating the finer details.

Here is an example usage, where we transfer a photo to the anime style of 'Counterfeit' checkpoint.

ditail-example

In some cases, if the effect is too weak, you can try a larger Positive Prompt Scaling (Alpha).

w-e-w commented 1 month ago

中秋節快樂,汗。。。我這沒節日氣氛 今早家人訊息才知道今天中秋節的

RicercarG commented 1 month ago

哈哈 辛苦了!