LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
5.27k stars 360 forks source link

[Suggestion] Configure Automatic1111 generation settings #485

Open VL4DST3R opened 1 year ago

VL4DST3R commented 1 year ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

Expected Behavior

Is it possible to have a bit more control over the generated images via the A1111 API? At the very least to change the resolution or enable some form of upscaling as the images generated are very grainy and low resolution currently. They are good as a gimmick/a bit of flavor to sprinkle in the text, but not really worth "viewing" beyond the thumbnail.

Current Behavior

Besides prefixing the prompt taken from your chat context and changing the model used, you cannot do much else.

LostRuins commented 1 year ago

All images handled by Kobold Lite are downscaled to 256x256 by design, earliest tests with larger images showed an unacceptable slowdown to the UI and file sizes once the stories started getting longer, especially so because Kobold Lite embeds all images within the story and save automatically. The following customizations are possible, which you can try:

  1. Clicking "Styles" inside Kobold Lite allows you to prefix all images with specific keywords. This is helpful for getting consistent outputs. Some examples might be "monochrome, 3d render, sketch, vibrant, HD photo, etc etc"
  2. You're able to select the SD model to use from the dropdown list in the settings panel. It will use your last selected model by default.
  3. You can Long Press the "Add Img" button to manually specify a prompt to use for generating an image.

If you want to do additional stuff like customizing cfg scale, changing the sampler from Euler A or number of steps, you'd have to directly modify the payload in the source code for now. No fine grained configs are currently planned.

VL4DST3R commented 1 year ago

Yeah I figured the embedding of the image inside the text itself was part of the reason, although the downscaling to 256 (and i presume the very lossy jpeg compression used) would explain the noticeable quality drop, I didn't realize they weren't even 512x512.

1+2. I've already made use of the prefix feature to add a few "quality" tokens, but the subject itself was usually decent, the resulting image resolution/compression quality was my main issue and reason for making this ticket.

  1. That I did not know! Nice, maybe you should add it as a popup on hover? because I had no idea the feature even existed.

Real shame about no plans for a configurable sampler and such, but I understand this is ultimately a gimmick more than anything else. Unless you will revisit this whole topic at a later date, feel free to close this suggestion.

alicat22 commented 1 year ago

A lot of the quality from SD picture generation comes from High Res Fix, so I would love to see that as a potential option.

VL4DST3R commented 1 year ago

Yeah, especially with the new SDXL stuff, it seems to be designed to be a multi-step process to get good results with this technology. That's not to say you don't get anything decent in first pass, but it's clearly not ideal.

LostRuins commented 1 year ago

Should I force high res fix to be true? Are there any downsides?

VL4DST3R commented 1 year ago

Besides slower gen I don't think so. But given your reasoning with not wanting to bloat the file with large images I don't know how much can be achieved this way. Even if we get better images internally, if it still gets crunched down to 256 then you don't really get to enjoy much of it.

Maybe a different solution to store images would be preferred altogether?

EDIT: which actually leads me to something else I noticed: images generated via the API do not get saved on the machine generating them at all? I hoped I could at least retroactively see them at native res in the output folder within SD, but nothing gets saved when generated from the kobold UI.

LostRuins commented 1 year ago

Yeah, I think that's how A1111 works, but maybe there's a setting to overwrite it.

VL4DST3R commented 1 year ago

Indeed it is. Could you maybe expose it within the UI?

nanafy commented 11 months ago

I would love to be able to change the resolution. Especially now with sdxl, it be a game changer

djm1176 commented 10 months ago

I'm fairly new to LLM and kobold, so keep that in mind.

  1. Is it reasonable to save generated images in a folder such as "res" (resources) next to the generated story/log, and the story or log references the image by filename? Users then have the option to transmit the story, or story + resources.

  2. Would you be open to pull requests to extend the customize-ability of the automatic111 API into koboldcpp?

LostRuins commented 10 months ago
  1. This should already be done by your image generation backend, such as A1111, which allows saving all generated images to a folder. From the kobold side, the images are compressed (to save space) and embedded into the story, and the original is not preserved. Saving to the local disk from KCPP is not ideal - I don't want to deal with writing to the user's filesystem which is messy and potentially risking exploits, instead you can save via the browser itself by Right Clicking the image -> Save As.
  2. Yeap pull requests are welcome, though preferably discuss what you'd like to implement first!

Right now it's already possible to customize step count and cfg scale. In future, I might consider adding a toggle to enable higher resolution, I have thought of a good way to do this.

LostRuins commented 9 months ago

You can now also save images in "higher res" mode which takes up about double the space. It's not full resolution but it should provide a better compromise.

VL4DST3R commented 9 months ago

Could you add the flag to simply also save it in the local A1111 output folder? The one I linked above. This way there would be no need to fiddle with the ui or worry about lower res images.

LostRuins commented 9 months ago

I've added the flag to Lite as requested. Toggle it in settings.

Though I'd caution against using it on remote servers - images saved remotely this way cannot be deleted by Lite after generation - this means that whoever is running the A1111 server will have persistent access to your previously generated images. If you are using a cloud service like runpod or colab, they will have your generated images written to disk too.

Keeping the image in your own stories is safer, you can delete them anytime, and save them to your device at will.

VL4DST3R commented 9 months ago

Keeping the image in your own stories is safer, you can delete them anytime, and save them to your device at will.

I know, but when you're hosting both locally it seems like needless tedium. Thanks for adding it!