Stability-AI / stablediffusion

High-Resolution Image Synthesis with Latent Diffusion Models
MIT License
38.83k stars 5.01k forks source link

Model load error #229

Open Ming-XF opened 1 year ago

Ming-XF commented 1 year ago

I downloaded the 768 model weight and got the wrong result after running the following command. May I ask why?

(myconda) root@ZBjlKr:/mnt/expriment/mission/stablediffusion# python scripts/txt2img.py --prompt "a professional photograph of an astronaut riding a horse" --ckpt checkpoints/768model.ckpt --config configs/stable-diffusion/v2-inference-v.yaml --H 768 --W 768
Global seed set to 42 Loading model from checkpoints/768model.ckpt Global Step: 110000 No module 'xformers'. Proceeding without it. LatentDiffusion: Running in v-prediction mode DiffusionWrapper has 865.91 M params. Keeping EMAs of 688. making attention of type 'vanilla' with 512 in_channels Working with z of shape (1, 4, 32, 32) = 4096 dimensions. making attention of type 'vanilla' with 512 in_channels Traceback (most recent call last): File "/mnt/expriment/mission/stablediffusion/scripts/txt2img.py", line 388, in main(opt) File "/mnt/expriment/mission/stablediffusion/scripts/txt2img.py", line 219, in main model = load_model_from_config(config, f"{opt.ckpt}", device) File "/mnt/expriment/mission/stablediffusion/scripts/txt2img.py", line 34, in load_model_from_config model = instantiate_from_config(config.model) File "/mnt/expriment/mission/stablediffusion/ldm/util.py", line 89, in instantiate_from_config return get_obj_from_str(config["target"])(config.get("params", dict())) File "/mnt/expriment/mission/stablediffusion/ldm/models/diffusion/ddpm.py", line 563, in init self.instantiate_cond_stage(cond_stage_config) File "/mnt/expriment/mission/stablediffusion/ldm/models/diffusion/ddpm.py", line 630, in instantiate_cond_stage model = instantiate_from_config(config) File "/mnt/expriment/mission/stablediffusion/ldm/util.py", line 89, in instantiate_from_config return get_obj_from_str(config["target"])(config.get("params", dict())) File "/mnt/expriment/mission/stablediffusion/ldm/util.py", line 97, in get_obj_from_str return getattr(importlib.import_module(module, package=None), cls) File "/root/miniconda3/envs/myconda/lib/python3.9/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1030, in _gcd_import File "", line 1007, in _find_and_load File "", line 986, in _find_and_load_unlocked File "", line 680, in _load_unlocked File "", line 850, in exec_module File "", line 228, in _call_with_frames_removed File "/mnt/expriment/mission/stablediffusion/ldm/modules/encoders/modules.py", line 3, in import kornia File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/kornia/init.py", line 10, in from . import ( File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/kornia/augmentation/init.py", line 43, in from .container import AugmentationSequential, ImageSequential, PatchSequential, VideoSequential File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/kornia/augmentation/container/init.py", line 1, in from .augment import AugmentationSequential File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/kornia/augmentation/container/augment.py", line 17, in from .patch import PatchSequential File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/kornia/augmentation/container/patch.py", line 9, in from kornia.contrib.extract_patches import extract_tensor_patches File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/kornia/contrib/init.py", line 4, in from .image_stitching import ImageStitcher File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/kornia/contrib/image_stitching.py", line 7, in from kornia.feature import LocalFeatureMatcher, LoFTR File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/kornia/feature/init.py", line 4, in from .integrated import ( File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/kornia/feature/integrated.py", line 52, in class LAFDescriptor(nn.Module): File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/kornia/feature/integrated.py", line 65, in LAFDescriptor patch_descriptor_module: nn.Module = HardNet(True), File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/kornia/feature/hardnet.py", line 66, in init pretrained_dict = torch.hub.load_state_dict_from_url( File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/torch/hub.py", line 731, in load_state_dict_from_url return torch.load(cached_file, map_location=map_location) File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/torch/serialization.py", line 713, in load return _legacy_load(opened_file, map_location, pickle_module, pickle_load_args) File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/torch/serialization.py", line 920, in _legacy_load magic_number = pickle_module.load(f, pickle_load_args) _pickle.UnpicklingError: invalid load key, '\x12'.

colin-heberling commented 1 year ago

I'm getting a similar error loading/running the model. I'm running on Windows 10 using the Ubuntu app. Installed using the ldm conda environment that was provided in the repository, and as far as I know, I installed all dependencies and the xformers efficient attention (same error before installing xformers). I also tried changing the configs and between the two checkpoints provided at the HuggingFace website, but each checkpoint/config pairing seems to give a different error. Which checkpoints and configs should I be using? See below for error messages on one such pairing:

(ldm) cheberling@DESKTOP-HEQ6C9H:/mnt/d/Pictures/stablediffusion$ python ./scripts/txt2img.py --from-file ./inputs/test1.txt --outdir ./outputs --config ./configs/stable-diffusion/v2-1-stable-unclip-h-inference.yaml --ckpt ./checkpoints/sd21-unclip-h.ckpt Global seed set to 42 Loading model from ./checkpoints/sd21-unclip-h.ckpt Global Step: 130000 ImageEmbeddingConditionedLatentDiffusion: Running in v-prediction mode Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. DiffusionWrapper has 870.17 M params. making attention of type 'vanilla-xformers' with 512 in_channels building MemoryEfficientAttnBlock with 512 in_channels... Working with z of shape (1, 4, 32, 32) = 4096 dimensions. making attention of type 'vanilla-xformers' with 512 in_channels building MemoryEfficientAttnBlock with 512 inchannels... Creating invisible watermark encoder (see https://github.com/ShieldMnt/invisible-watermark)... reading prompts from ./inputs/test1.txt Sampling: 0%| | 0/3 [00:00<?, ?it/sWarning: Got 1 conditionings but batch-size is 3 | 0/1 [00:00<?, ?it/s] Data shape for DDIM sampling is (3, 4, 64, 64), eta 0.0 Running DDIM Sampling with 50 timesteps DDIM Sampler: 0%| | 0/50 [00:00<?, ?it/s] data: 0%| | 0/1 [00:01<?, ?it/s] Sampling: 0%| | 0/3 [00:01<?, ?it/s] Traceback (most recent call last): File "./scripts/txt2img.py", line 388, in main(opt) File "./scripts/txt2img.py", line 347, in main samples, = sampler.sample(S=opt.steps, File "/home/cheberling/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, kwargs) File "/mnt/d/Pictures/stablediffusion/ldm/models/diffusion/ddim.py", line 104, in sample samples, intermediates = self.ddim_sampling(conditioning, size, File "/home/cheberling/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, *kwargs) File "/mnt/d/Pictures/stablediffusion/ldm/models/diffusion/ddim.py", line 164, in ddim_sampling outs = self.p_sample_ddim(img, cond, ts, index=index, use_original_steps=ddim_use_original_steps, File "/home/cheberling/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(args, kwargs) File "/mnt/d/Pictures/stablediffusion/ldm/models/diffusion/ddim.py", line 212, in p_sample_ddim model_uncond, model_t = self.model.apply_model(x_in, t_in, c_in).chunk(2) File "/mnt/d/Pictures/stablediffusion/ldm/models/diffusion/ddpm.py", line 858, in apply_model x_recon = self.model(x_noisy, t, *cond) File "/home/cheberling/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, **kwargs) File "/mnt/d/Pictures/stablediffusion/ldm/models/diffusion/ddpm.py", line 1346, in forward assert c_adm is not None AssertionError