Brawlence / SD_api_pics

An extension to oobabooga's TextGen allowing you to receive pics generated by Automatic1111's SD API
12 stars 1 forks source link

1) Filter common conversational words and characters, 2) make output imgs cheap by linking, 3) get models from SD-server and let user choose #3

Open kiancn opened 1 year ago

kiancn commented 1 year ago

1) This pull adds a function that filters for conversational words and needless special characters. 2) This pull replaces the direct 'placement' of the image into the chat with a link - but only when saving the generated image is selected. It isn't perfect but the links do not clutter the chat with large amounts of tokens like <img ..> does.

I had to update the old 'not-updated-code' (in comparison to the one in oobabooga), but this enhancement works the same nonetheless.

I recommend updating the code in this repo to at least match the one in oobabooga. Include my enhancement or don't but please update the code in the repo marked as the experimental one; or encourage updates directly in the main oobabooga repo.

kiancn commented 1 year ago

Sorry about the typos in the descriptions :/ Just ask, if something in incomprehensible :)

Brawlence commented 1 year ago

The idea has merit.

First, I wanted to argue that repeated substring searches are not performant, but... I'm kinda surprised by the benchmark results. Can't say it manages to achieve good outputs, though: TESTRUN.

The output string is a mess of words broken by commas, sometimes the things left are outright non-descriptive.

Second, I don't believe the header statement; after all, (natural descriptions - picture) pairs are what CLIP was trained on. Stable Diffusion checkpoints include CLIP, so they should perform well. Well, maybe if one uses NovelAI-based checkpoints -- then yes, it's less tolerant to natural language descriptions and responds better to danbooru-style tags instead.

I can't decide yet if the PR is worth it. Check this out in the meantime: there was another PR in the main repo which tried to improve prompt tags in another way.


As for the tokens, this fixes the problem that shouldn't even exist in the first place (if it even exists, cause I know for a fact that ooba logs two histories, one visible for the UI and the one hidden for the model itself) — non-text inputs should not be forwarded to the model at all. This is relevant not only to SD_api_pics, but also to TTS & similar extensions that use embeddings.

kiancn commented 1 year ago

Regarding the jumbled collection of words (mess) resulting from the operation of filter_out_conversational_words.

Yes, the output string is jumbled, but from a decent range of tests before implementing the filter and more than a 100 tests after implementing it, I can safely postulate that a lot of SD models like the jumbled strings better (give results closer to the intent of the unjumbled text; and the unjumbled text is still display to the user). Most SD models I know of have training content tagged with danbooru tags - I believe this is why it works better with only 'keywords': I mean it works for me and the models I use and a lot of others - check ex. tags on the images posted on civitai.com; every single one I have seen is basically word soup like the one you are getting from the filter_out_conversational_words-function. I assure you that a lot of people will find these prompt more desireable than full sentences; we could make it an option in the settings perhaps (danbooru-style/deepbooru-style).

Regarding the speed of the function

Yes. The function is relatively slow. 0.74 secs on my pc (running your test code). That is pretty bad, but for this type of application I believe the 0.7 seconds (for a 1000 requests) is acceptable. However, by removing the second run of deletions of elements from substrings_to_remove-variable in the string-variable, the test time goes does to 0.43 secs. And the quality of the string resulting from the removal is not reduced significantly. Also, you are doing a 1000 reps in the test, which means that if you were doing something still insane, but possible, like evaluate 5 strings after each other it would take only 0,0037 secs ((0.74/1000)*5). I don't believe the speed of the code is an issue.

The token thing

Yeah, it is really wierd. I tested a bit before deciding on the link-solution. I regret posting it now, because it is not satisfactory. We really want the images to appear on the same page. But it does 'solve' the problem in a way that is understandable to non-tech users. (And since most users have GPUs with less than 12/16GB of VRAM removing these tokens really is a significant optimization.) [Edit. I'm really curious as to why an img with a path-source takes up tokens... Does it try to read the image? In base 64 I get, but img with source clogging is ... funny? Investigating.... ]

kiancn commented 1 year ago

Model selection now completely functional in the UI

It seemed like something that was missing.

  1. It works by getting sdapi/v1/sd_models and filtering the results to a list (called sd_models)
  2. then the currently selected model is filtered from the response from sdapi/v1/options and saved to params['SD_model']
  3. when selecting model from dropdown in ui a post request is sent to sdapi/v1/options with the name of the selected model, and params['SD_model'] is updated.
Brawlence commented 1 year ago

The model selection is currently breaking VRAM management options. Gotta think what's up with that

kiancn commented 1 year ago

That is funky... What happens - error message, nothing?

However, for me, the VRAM management never worked. I've got responses like this one with every request the extension sends from the give_VRAM_priority function: [...] File "C:\aitools\oobabooga-windows\text-generation-webui\extensions\sd_api_pictures\script.py", line 66, in give_VRAM_priority response.raise_for_status() File "C:\aitools\oobabooga-windows\installer_files\env\lib\site-packages\requests\models.py", line 1021, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http://127.0.0.1:1048/sdapi/v1/unload-checkpoint I have been lazy and assumed that my slightly outdated version of Automatic1111's ui does not have the end-points reload-checkpoint and unload-checkpoint: ... And I just verified that this is the case.

I'll fix my install so I can at least experience the same error as the up-to-date user :)

kiancn commented 1 year ago

Ok. I updated the relevant parts of the code (in Automatic1111 codebase) to be up-to-date, and now VRAM-management works for me. Which puzzles me. I do get a few messages about deprecations in 'torch.py', '_utils.py', and 'storage.py'. But it works - the following message is shown exactly once: Prompting the image generator via the API on http://127.0.0.1:1048... Requesting Auto1111 to vacate VRAM... Loading mayaeary_pygmalion-6b-4bit-128g... Found the following quantized model: models\mayaeary_pygmalion-6b-4bit-128g\pygmalion-6b-4bit-128g.safetensors Loading model ... C:\aitools\oobabooga-windows\installer_files\env\lib\site-packages\safetensors\torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() with safe_open(filename, framework="pt", device=device) as f: C:\aitools\oobabooga-windows\installer_files\env\lib\site-packages\torch\_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.__get__(instance, owner)() C:\aitools\oobabooga-windows\installer_files\env\lib\site-packages\torch\storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() storage = cls(wrap_storage=untyped_storage) Done. Loaded the model in 2.13 seconds.

I played around a bit and didn't get the extension to fail.

@Brawlence Could you describe the error you experience?