CompVis / stable-diffusion

A latent text-to-image diffusion model
https://ommer-lab.com/research/latent-diffusion-models/
Other
68.24k stars 10.15k forks source link

Different img2img results from DreamStudio #411

Closed iutlu closed 4 months ago

iutlu commented 2 years ago

I've been trying to reproduce the img2img results that I got from DreamStudio locally, using the CompVis SD repo, but with no success.

Here's the command I'm running locally: python scripts/img2img.py --prompt "hyperrealistic dslr film still of a person, face disguised as legumes, stunning 8 k octane comprehensive 3 d render, inspired by istvan sandorfi & greg rutkowski & unreal engine, perfect symmetry, dim volumetric cinematic lighting, extremely hyper - detailed, incredibly real lifelike attributes & flesh texture, intricate, masterpiece, artstation, stunning" --init-img /home/onur/Desktop/test.png --strength 0.56 --ddim_steps 25 --seed 290400995 --scale 7 --n_samples 1 --skip_grid

So I have an init-img test.png, with resolution 512x512 (see below)

test

And I run img2img on it with the prompt

hyperrealistic dslr film still of a person, face disguised as legumes, stunning 8 k octane comprehensive 3 d render, inspired by istvan sandorfi & greg rutkowski & unreal engine, perfect symmetry, dim volumetric cinematic lighting, extremely hyper - detailed, incredibly real lifelike attributes & flesh texture, intricate, masterpiece, artstation, stunning

(^some random prompt)

with strength=0.56 ddim-steps=25 seed=290400995 scale=7

The output (with the 1.4 checkpoint) is: 00070

Going over to DreamStudio, I configure it with the following settings: Cfg Scale=7 Steps=25 Number of Images=1 Sampler=ddim Model=Stable Diffusion v1.4 Seed=290400995 CLIP Guidance=Off Image Strength=44% (the ranges seem to be flipped -- hence the 1-minus)

The output is: ds

I was wondering, am I missing something? Shouldn't I have gotten the same results with identical prompts, init-img, seed, and config? Is there some known explanation I'm not aware of?

Thanks!

DjTecmo commented 2 years ago

I also experiment this messy results with no reason when its supposed to create coherently thigns like DreamStudio or NightcafeStudio does. The results here looks like ifd the model were the dumb messy deformed way old model "Guided Diffusion" does the things, really. I need an explanation too

popoala commented 1 year ago

I think it had something to do with img2img.py script for not running N time steps reconstruction.