Acly / krita-ai-diffusion

Streamlined interface for generating images with AI in Krita. Inpaint and outpaint with optional text prompt, no tweaking required.
https://www.interstice.cloud
GNU General Public License v3.0
6.52k stars 314 forks source link

Add a way to generate a prompt from a layer #372

Open CarlQLange opened 8 months ago

CarlQLange commented 8 months ago

IPAdapter is great, but I find you often need to use some terms in your prompt that describe the image style as well. However, I have a terrible memory for the names of styles, and sometimes this breaks my flow a bit. The same thing can happen if I've been tweaking a selection of an image for ages and then want to go back and do another low-strength pass on the overall image - I have to re-write a whole prompt to match the full image.

It would be pretty handy if there was a way to generate a prompt from the current image or a specific layer. I believe this is possible in ComfyUI already with some custom nodes such as https://github.com/alpertunga-bile/prompt-generator-comfyui; perhaps there is also a simpler way just using BLIP.

daniel-richter commented 8 months ago

You can recall the prompt for each generated image - why doesn't that solve your problem?

CarlQLange commented 8 months ago

Ah, thanks for that note - it solves the second part of my issue, but not everything. For instance, if I want to riff on an existing image from outside Krita (eg a photo of my friends, or a sketch I have hand-drawn), I would like to drag and drop the image into Krita and generate a prompt from the image rather than needing to describe the entire image.

Acly commented 8 months ago

It might also be useful to automatically generate a prompt in certain cases when the prompt field is empty (eg. for object removal).

The repo you linked seems unrelated though? It does something different. WAS node suite includes a node for BLIP, but it's too big/bloated. How well does BLIP work anyway, and how fast is it?

CarlQLange commented 8 months ago

Ah sorry, I just did a cursory google for some nodes and that looked right, but I didn't look closely.

If I remember correctly, Automatic1111 has built-in BLIP, but yes, it took quite a while and sometimes ran out of memory. Perhaps things have moved on since then.

Yes, object removal is also a very good use case. Sometimes you just want to do the thing rather than spend several minutes writing a prompt.

CarlQLange commented 8 months ago

I had a look at the code in the WAS suite and the BLIP-specific stuff seemed pretty easy to factor out, there doesn't seem to be much interdependency between the WAS suite functions in general. If I forked it just to have those nodes (BLIPLoader and BLIPAnalyzeImage) and made sure they worked, would that be enough for you to add this workflow to the extension? Depending on how slow it is, whether it works to a reasonable level, etc?

Alternatively, there is a WaifuDiffusion Tagger custom node: https://github.com/pythongosssss/ComfyUI-WD14-Tagger but obviously there's quite a risk of "spicy" tags on benign images given the source data.