AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
139.66k stars 26.47k forks source link

Implement Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis #1417

Open rabidcopy opened 1 year ago

rabidcopy commented 1 year ago

Is your feature request related to a problem? Please describe. Don't think this is a duplicate of anything else and shouldn't be confused with #1325. This is related to the problems showcased in these images provided by the research paper that will be linked below. Anyone who uses SD on a frequent basis should know some of these issues far too well. image

Describe the solution you'd like Implementation of the changes made to txt2img.py and attention.py to reduce these problems that show up in AI image generation. Obviously this shouldn't replace the default and should be an option offered as opt-in with plenty of warning that it will produce different results than what is currently produced.

Appropriate links to the research page, paper, and zip file that contains their modified txt2img.py and attention.py https://openreview.net/forum?id=PUIqjT4rzq7 https://openreview.net/pdf?id=PUIqjT4rzq7 https://openreview.net/attachment?id=PUIqjT4rzq7&name=supplementary_material

C43H66N12O12S2 commented 1 year ago

This seems to rely on parsing the input prompt to seperate nouns and tokenize each one seperately. I lack any experience with such a thing - though I tried anyways and failed.

It also uses a NLP model and depends on stanza.

As far as attention goes, I believe this would be sufficient.

    if isinstance(context, list):
      uc_context = context[0]
      context_k, context_v = context[1]['k'], context[1]['v']
      k_in = self.to_k(torch.cat([uc_context, context_k], dim=0)) * self.scale
      v_in = self.to_v(torch.cat([uc_context, context_v], dim=0))
    else:
      k_in = self.to_k(context) * self.scale
      v_in = self.to_v(context)

@AUTOMATIC1111 would you be interested in this?

differentprogramming commented 1 year ago

This seems to rely on parsing the input prompt to seperate nouns and tokenize each one seperately. I lack any experience with such a thing - though I tried anyways and failed.

As far as attention goes, I believe this would be sufficient.

    if isinstance(context, list):
      uc_context = context[0]
      context_k, context_v = context[1]['k'], context[1]['v']
      k_in = self.to_k(torch.cat([uc_context, context_k], dim=0)) * self.scale
      v_in = self.to_v(torch.cat([uc_context, context_v], dim=0))
    else:
      k_in = self.to_k(context) * self.scale
      v_in = self.to_v(context)

@AUTOMATIC1111 would you be interested in this?

Where would that go, I'd like to try it!

C43H66N12O12S2 commented 1 year ago

@differentprogramming It won't work. The hard part is in txt2img

differentprogramming commented 1 year ago

I tried to run the sample version but it dies:

2022-10-01 23:53:04 INFO: Use device: gpu 2022-10-01 23:53:04 INFO: Loading: tokenize 2022-10-01 23:53:07 INFO: Loading: pos 2022-10-01 23:53:08 INFO: Loading: constituency 2022-10-01 23:53:09 INFO: Done loading processors! Traceback (most recent call last): File "C:\Users\joshu\anaconda3\envs\ldm\lib\site-packages\transformers\utils\hub.py", line 408, in cached_file resolved_file = hf_hub_download( File "C:\Users\joshu\anaconda3\envs\ldm\lib\site-packages\huggingface_hub\file_download.py", line 1099, in hf_hub_download _raise_for_status(r) File "C:\Users\joshu\anaconda3\envs\ldm\lib\site-packages\huggingface_hub\utils_errors.py", line 148, in _raise_for_status raise e File "C:\Users\joshu\anaconda3\envs\ldm\lib\site-packages\huggingface_hub\utils_errors.py", line 111, in _raise_for_status response.raise_for_status() File "C:\Users\joshu\anaconda3\envs\ldm\lib\site-packages\requests\models.py", line 1021, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co//resolve/main/preprocessor_config.json (Request ID: UUwBY8TcCL7a11nBMUtFz)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "scripts/txt2img.py", line 35, in safety_feature_extractor = AutoFeatureExtractor.from_pretrained(safety_model_id) File "C:\Users\joshu\anaconda3\envs\ldm\lib\site-packages\transformers\models\auto\feature_extraction_auto.py", line 292, in from_pretrained configdict, = FeatureExtractionMixin.get_feature_extractor_dict(pretrained_model_name_or_path, **kwargs) File "C:\Users\joshu\anaconda3\envs\ldm\lib\site-packages\transformers\feature_extraction_utils.py", line 398, in get_feature_extractor_dict resolved_feature_extractor_file = cached_file( File "C:\Users\joshu\anaconda3\envs\ldm\lib\site-packages\transformers\utils\hub.py", line 465, in cached_file raise EnvironmentError(f"There was a specific connection error when trying to load {path_or_repo_id}:\n{err}") OSError: There was a specific connection error when trying to load : 404 Client Error: Not Found for url: https://huggingface.co//resolve/main/preprocessor_config.json (Request ID: UUwBY8TcCL7a11nBMUtFz)

isaac-bender commented 1 year ago

I tried to run the sample version but it dies: ... OSError: There was a specific connection error when trying to load : 404 Client Error: Not Found for url: https://huggingface.co//resolve/main/preprocessor_config.json (Request ID: UUwBY8TcCL7a11nBMUtFz)

That error has nothing to do with the code, you're just failing to connect to huggingface, probably because you didn't supply login info

Ehplodor commented 1 year ago

Up. This is a must IMHO