Stability-AI / stablediffusion

High-Resolution Image Synthesis with Latent Diffusion Models
MIT License
39.09k stars 5.04k forks source link

Program stuck on "Sampling" #260

Open s-sangwon opened 1 year ago

s-sangwon commented 1 year ago

(base) C:\Users\GTX73\stablediffusion>python scripts/txt2img.py --prompt "our galaxy itself contains a hundred billion stars" --ckpt C:\Users\GTX73\Downloads\768-v-ema.ckpt --config configs/stable-diffusion/v2-inference-v.yaml --H 768 --W 768 --n_samples 1 Global seed set to 42 Loading model from C:\Users\GTX73\Downloads\768-v-ema.ckpt Global Step: 140000 LatentDiffusion: Running in v-prediction mode Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. DiffusionWrapper has 865.91 M params. making attention of type 'vanilla-xformers' with 512 in_channels building MemoryEfficientAttnBlock with 512 in_channels... Working with z of shape (1, 4, 32, 32) = 4096 dimensions. making attention of type 'vanilla-xformers' with 512 in_channels building MemoryEfficientAttnBlock with 512 in_channels... Creating invisible watermark encoder (see https://github.com/ShieldMnt/invisible-watermark)... Sampling: 0%| | 0/3 [00:00<?, ?it/s]Data shape for DDIM sampling is (1, 4, 96, 96), eta 0.0 Running DDIM Sampling with 50 timesteps | 0/1 [00:00<?, ?it/s]

DDIM Sampler: 0%| | 0/50 [00:00<?, ?it/s]

image

How do you solve this problem?

hotelbread commented 1 year ago

me either. I also just typed sample code

python scripts/txt2img.py --prompt "a professional photograph of an astronaut riding a horse" --ckpt checkpoints/v2-1_512-ema-pruned.ckpt --config configs/stable-diffusion/v2-inference.yaml

and

python scripts/txt2img.py --prompt "a professional photograph of an astronaut riding a horse" --ckpt checkpoints/v2-1_768-ema-pruned.ckpt --config configs/stable-diffusion/v2-inference-v.yaml --H 768 --W 768

and then get same result as above the pic

ZXStudio commented 1 year ago

I also have the same problem

IceClear commented 1 year ago

I guess the code currently does not support v-sampling: https://github.com/Stability-AI/stablediffusion/blob/main/ldm/models/diffusion/ddpm.py#L920