CompVis / stable-diffusion

A latent text-to-image diffusion model
https://ommer-lab.com/research/latent-diffusion-models/
Other
67.99k stars 10.13k forks source link

Transparent backgrounds for output instead of black color for sprites generation (Is that possible?) #304

Open toasterhead-master opened 2 years ago

toasterhead-master commented 2 years ago

Hi, it appears that every time I post a .png image with transparent background to img2img it always outputs black color instead of transparent background. I couldn't find anything related to that issue, it seems like nobody is talking about it. Is that even possible? It would really help with creating sprites without the need of manually editing and removing the background and damaging the looks in the process.

CaptDoodle commented 2 years ago

Just encountered the same issue myself cropping and upscaling multiple images to add together. I couldn't see an option within the WebUI I use, but I imagine there would be some prompt that could force the output to keep the transparent background.

tskazinski commented 2 years ago

I would also like to know this? or even if it's possible to choose the transparent color so its something not found on the image and easily turn to transparent afterwards.

Tripsette commented 2 years ago

Came to ask the same thing. Having trouble even figuring out how to just get it to change a flat background color to a different background color not present in the image.

wendten commented 2 years ago

Transparency operate on a 4th pixel color channel, where the 3 others are red green and blue. Stable Diffusion is only trained on 3 channels and therefore have no encoded knowledge on transparency. So with the current 1.4 model that is not possible. Maybe future models will be 4 dimensional or even 5, where the fifth would be a time dimension of animated images like gifs.

CaptDoodle commented 2 years ago

Awesome, good to know it's the capability of the current version and not my novice abilities. Thanks for the response

On Wed, 5 Oct 2022, 07:25 wendten, @.***> wrote:

Transparency operate on a 4th pixel color channel, where the 3 others are red green and blue. Stable Diffusion is only trained on 3 channels and therefore have no encoded knowledge on transparency. So with the current 1.4 model that is not possible. Maybe future models will be 4 dimensional or even 5, where the fifth would be a time dimension of animated images like gifs.

— Reply to this email directly, view it on GitHub https://github.com/CompVis/stable-diffusion/issues/304#issuecomment-1267536368, or unsubscribe https://github.com/notifications/unsubscribe-auth/A27ME3WQ5SKHJAXVIHMJPMLWBSHDXANCNFSM6AAAAAAQRIE4LM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

dk205 commented 1 year ago

I would also like this feature.

i00 commented 1 year ago

Same here ... would be good to create icons for developers with no artistic skill :P

dotRelith commented 1 year ago

the way i found for my pics to have a transparent background is to use a site like https://www.remove.bg (and set a solid white/green background on the prompt)

Nazosan commented 1 year ago

Transparency operate on a 4th pixel color channel, where the 3 others are red green and blue. Stable Diffusion is only trained on 3 channels and therefore have no encoded knowledge on transparency. So with the current 1.4 model that is not possible. Maybe future models will be 4 dimensional or even 5, where the fifth would be a time dimension of animated images like gifs.

I've been wondering about this myself. I don't think it's a model training thing though. It's more fundamental to the software itself I think. For starters, there's no real training needed to setting a pixel as transparent since basically it's just as simple as "anything background is transparent" and as long as it understands what a background is it can do that. We're basically talking about zeros and ones on that. (I realize there can be 8-bits per pixel in the alpha channel, but in regards to what we're discussing even just one is better than none.) I think the biggest problem is Stable Diffusion itself doesn't really understand transparency. It's outputting 24-bit files even if you set it to output PNGs -- there isn't even a blank alpha channel present. In fact, that, in itself, is annoying because I have to manually add an alpha channel before I can do things like deleting the background for multiple layer stuff. Most things, even if they don't understand alpha transparency at least put in a blank alpha channel, but this doesn't even do that much. Even without it understanding transparency I do at least wish it would output 32-bit images. (I should add here that since PNG is basically compressed by default, unless it goes completely nuts in how it handles the files an alpha channel that is 100% opaque really wouldn't significantly add to the file size, so really wouldn't hurt anything in any meaningful way. If someone really is worried about file sizes they don't want the default compression built into the software anyway but want to use a PNG optimizer after the fact.)

Ultimately I think it comes down to Stable Diffusion itself needing to have a concept of transparency more than anything. I don't know the exact specifics of how this all works and am not a programmer, but I think it might be a matter of something like getting it to recognize a pixel as "background" or not and when "transparent background" or similar is specified simply discarding any background pixels the models output. I don't know how possible that is or is not, but I do think it could seriously help in a lot of things such as mixing images. Even if it needed model training, I suspect it really wouldn't be all that much.

EDIT: Or if all else fails, perhaps integration of something such as https://github.com/danielgatis/rembg might at least be a good stopgap, though it's cleaner and better if it generates a background-less image to begin with.

NoteToSelfFindGoodNickname commented 1 year ago

Perfectly agree with you. I need this so much! But I also need a dropshadow in the alpha channel.

ReviveChan commented 1 year ago

Alpha channel will be really useful in some complex situation. For example, create an image or a video by generating each single object. Hope we could have this ability soon.

Nazosan commented 1 year ago

I want to add, I've tried both the separate original rembg and the one integrated into Stable-Diffusion-WebUI and both are very lacking on some things. As a prime example, hair in particular really fails badly with rembg. In the end the only thing that really helps is to render at the highest resolution I can get away with, with a simple, single color background, resize upwards a bit, manually remove the background, then resize downward to smooth out the edges. Though not all things resize well.

Perhaps it could be keyed to replace something very specific? A specific keyword or something perhaps? Something where when a specific keyword is used (say "transparency purple" or something perhaps) it would actually adjust both the color itself and the blending to/from it as alpha instead? Perhaps this could be handled slightly externally somewhat similarly to rembg but more integrated, thus making it independent of the models.