Closed RicercarG closed 1 month ago
note this is a preliminary review
issues found
*** Error running process: B:\GitHub\stable-diffusion-webui\extensions\sd-webui-ditail\scripts\ditail_ext.py
Traceback (most recent call last):
File "B:\GitHub\stable-diffusion-webui\modules\scripts.py", line 832, in process
script.process(p, *script_args)
File "B:\GitHub\stable-diffusion-webui\extensions\sd-webui-ditail\scripts\ditail_ext.py", line 226, in process
setattr(dummy, k, v)
File "B:\GitHub\stable-diffusion-webui\extensions\sd-webui-ditail\scripts\ditail_ext.py", line 145, in enable_ditail_callback
# reset the model condition stage key
File "B:\GitHub\stable-diffusion-webui\extensions\sd-webui-ditail\scripts\ditail_ext.py", line 276, in extract_latents
register_time(shared.sd_model.model.diffusion_model, params.sigma[0].item())
File "B:\GitHub\stable-diffusion-webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "B:\GitHub\stable-diffusion-webui\extensions\sd-webui-ditail\ditail\extract_features.py", line 93, in extract
init_latent = model.get_first_stage_encoding(model.encode_first_stage(init_image))
File "B:\GitHub\stable-diffusion-webui\modules\sd_hijack_utils.py", line 22, in <lambda>
setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
File "B:\GitHub\stable-diffusion-webui\modules\sd_hijack_utils.py", line 36, in __call__
return self.__orig_func(*args, **kwargs)
File "B:\GitHub\stable-diffusion-webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "B:\GitHub\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 830, in encode_first_stage
return self.first_stage_model.encode(x)
File "B:\GitHub\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\autoencoder.py", line 83, in encode
h = self.encoder(x)
File "B:\GitHub\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "B:\GitHub\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "B:\GitHub\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 523, in forward
hs = [self.conv_in(x)]
File "B:\GitHub\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "B:\GitHub\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "B:\GitHub\stable-diffusion-webui\extensions-builtin\Lora\networks.py", line 599, in network_Conv2d_forward
return originals.Conv2d_forward(self, input)
File "B:\GitHub\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\conv.py", line 460, in forward
return self._conv_forward(input, self.weight, self.bias)
File "B:\GitHub\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\conv.py", line 456, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (float) and bias type (struct c10::Half) should be the same
this is running on an Nvidia GPU
I've also tested running on CPU with fill precision and it seems to work
my guess is that somewhere in the pipeline you have sets a type for a certain tensor which does not match what is used by web UI
2. if you wish to modify a setting value then you should reset the setting to what it was so that is to not affect the user
for example shared.opts.sd_vae_overrides_per_model_preferences you set it to True but, if a user have this set to false then you would inadvertently cause issues for them in the future
if possible don't change the setting, try to achieve the same effect with other means
if changing the setting is necessary then preferably change the setting via `p.override_settings`
if a setting cannot be probably applied using override then change the setting but before changing you must remember the original value and then revert the value back to the original once you're done with it
3. the enable checkbox should not toggle a value at the server side
this is a webpage, meaning of multiple persons can access the same web page at the same time, the way you implemented means that if you have multiple web page or reload web page it will be a desync between the checkbox and the back end value
all inputs should be processed in the Scripts callback during generation
4. (related to 3.) unless there's a special reason when extension is disabled it should do "nothing"
most extensions pass the enable toggle as one of the args and have a `if not enable: retrun` logic at each callback
5. it's not a good idea to provide default image for text image
if you provide a default image then it basically means that every time even when the extension is not using the image the image is still needs to be sent back to the server which uses extra bandwidth and processing power may not be critical when running locally but when people is running on a remote server that doesn't have a good internet connection it can slow down the speed
6. no infotext / image generation parameters metadat
by default when review I generate an image it saves the generation parameters into
the metadata can be read by the UI and used to apply the same settings
https://github.com/w-e-w/sd-webui-infotext-example
I have an example repository showing that how info tax can be implemented
severity
issue 1. 2. are critical is
3. very important
4. 5. 6. is best to have
---
potential danger about using `script_callbacks.remove_current_script_callbacks()`
this is fine if you only have one call back on the same file, but if you add new callbacks in the future you could inadvertently remove callbacks you didn't wish to remove
if you have new callbacks in the future then you want to switch to `script_callbacks.remove_callbacks_for_function(callback_func)`
---
Hi @w-e-w,
Thank you for your feedback!
Regarding the first issue, my development environment is on a Mac with MPS, and I had the --no-half flag enabled by default, which led to the datatype issue being missed during testing.
I’ll address all issues you mention and follow up on this PR as soon as they’re resolved.
my development environment is on a Mac with MPS, and I had the --no-half flag
yes that explains it, when I test on CPU, it also uses the Mac with MPS, and I had the --no-half flag and so works so I'm guessing most likely somewhere in your code hard code some tensor to use full precision
irrc you should refrence these values for types https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/82a973c04367123ae98bd9abdf80d9eda9b910e2/modules/devices.py#L125-L129
also
consider switching to modules.ui_components.InputAccordion
fot the Accordion of the extention
it's basically an enable checkbox + accordion combo custom element
from modules.ui_components import InputAccordion
with InputAccordion(False, label=DITAIL, open=False, elem_id=eid("main_accordion")) as ditail_enable:
# ditail_enable <- this is the checkbox
....
Hi @w-e-w:
I've pushed all changes to main branch of https://github.com/MAPS-research/sd-webui-ditail.git as v0.1.1.
Changes I made according to your issues are as follows: Issue1 - type mismatch: In v0.1.0, I did hard code the input source image as fp32. Now in v0.1.1 the image tensor should be casted to match with webui dtype.
Issue2 - override settings:
Since our extension temporarily 'hijacks' the StableDiffusionProcessing object and then 'restore' it before process_images
function is called, p.override_settings
doesn't work in our case. To not affect the user, I've implemented the code for reverting values back to the original.
Issue3 - value between UI and server not being desynced: I changed the implementation of checkbox, and used gr.State to store the value (also changed the accordion to modules.ui_components.InputAccordion). I tested with two webpage instances, and no conflicts were found.
Issue4 - logic for disabling 'ditail' extension:
Along with the change for issue 3, now our extension also passes the enable toggle, and uses if not enable: return
logic.
Issue5 - default image should be removed: The placeholder image is removed (goodbye our lovely cocktail).
Issue6 - no infotext: Infotext for 'ditail' extension is now shown in the png metadata. Thank you for your tutorial repo, it really saved me lots of efforts.
Misc:
I've replaced script_callbacks.remove_current_script_callbacks()
with script_callbacks.remove_callbacks_for_function(callback_func)
as you suggested.
Thanks again for your comments and resources.
sorry to keep you waiting being occupied with other things recently
anyway the dtype issues still exist among infotext issues, I made a fix PR, hopefully it should be fixed this time
oh and I forgot to ask from my testing prompt doesn't seem to have effect not sure if I did something wrong with something
Hi @w-e-w:
Thanks for your contributions, and Happy Mid-autumn Festival!
I've merged your PRs to the main branch. This is my first time coding for open source projects, and I truly appreciate your patience. In our previous code I didn't pay much attention readability like following the PEP8 style guide and clearing unused imports. I’ll make sure to adhere to these standards moving forward.
For issues of text length and FP8 checkpoints, we've added them to known issues
in our README. We plan to fix them along with the support for SDXL and batch count/size larger than 1. This would take some time and we plan to start working on that during the beginning of October. In the meantime, we hope the current pull request to 'stable-diffusion-webui-extensions' can be merged if this issue is not critical. To our best knowledge, DDIM inversion for SD webui hasn’t been implemented before, and we’re excited to contribute something that could be useful for the community.
oh and I forgot to ask from my testing prompt doesn't seem to have effect not sure if I did something wrong with something
One of the use cases of our extension is transferring a given image to the style of a fine-tuned SD checkpoint. Prompts are used to control for what content the model should 'pay attention' to. We recommend using prompts as a description of the image (clip interrogation works well with our extension), as well as the way of activating LoRAs . This approach retains the overall structure of the input image while allowing the model flexibility in generating the finer details.
Here is an example usage, where we transfer a photo to the anime style of 'Counterfeit' checkpoint.
In some cases, if the effect is too weak, you can try a larger Positive Prompt Scaling (Alpha)
.
中秋節快樂,汗。。。我這沒節日氣氛 今早家人訊息才知道今天中秋節的
哈哈 辛苦了!
Info
https://github.com/MAPS-research/sd-webui-ditail.git
Checklist:
Readme.md
index.json
andextension_template.json
have not been modified.entry
is placed in theextensions
directory with the.json
file extension.