Open Ming-XF opened 1 year ago
I'm getting a similar error loading/running the model. I'm running on Windows 10 using the Ubuntu app. Installed using the ldm conda environment that was provided in the repository, and as far as I know, I installed all dependencies and the xformers efficient attention (same error before installing xformers). I also tried changing the configs and between the two checkpoints provided at the HuggingFace website, but each checkpoint/config pairing seems to give a different error. Which checkpoints and configs should I be using? See below for error messages on one such pairing:
(ldm) cheberling@DESKTOP-HEQ6C9H:/mnt/d/Pictures/stablediffusion$ python ./scripts/txt2img.py --from-file ./inputs/test1.txt --outdir ./outputs --config ./configs/stable-diffusion/v2-1-stable-unclip-h-inference.yaml --ckpt ./checkpoints/sd21-unclip-h.ckpt
Global seed set to 42
Loading model from ./checkpoints/sd21-unclip-h.ckpt
Global Step: 130000
ImageEmbeddingConditionedLatentDiffusion: Running in v-prediction mode
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
DiffusionWrapper has 870.17 M params.
making attention of type 'vanilla-xformers' with 512 in_channels
building MemoryEfficientAttnBlock with 512 in_channels...
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla-xformers' with 512 in_channels
building MemoryEfficientAttnBlock with 512 inchannels...
Creating invisible watermark encoder (see https://github.com/ShieldMnt/invisible-watermark)...
reading prompts from ./inputs/test1.txt
Sampling: 0%| | 0/3 [00:00<?, ?it/sWarning: Got 1 conditionings but batch-size is 3 | 0/1 [00:00<?, ?it/s]
Data shape for DDIM sampling is (3, 4, 64, 64), eta 0.0
Running DDIM Sampling with 50 timesteps
DDIM Sampler: 0%| | 0/50 [00:00<?, ?it/s]
data: 0%| | 0/1 [00:01<?, ?it/s]
Sampling: 0%| | 0/3 [00:01<?, ?it/s]
Traceback (most recent call last):
File "./scripts/txt2img.py", line 388, in
I downloaded the 768 model weight and got the wrong result after running the following command. May I ask why?
(myconda) root@ZBjlKr:/mnt/expriment/mission/stablediffusion# python scripts/txt2img.py --prompt "a professional photograph of an astronaut riding a horse" --ckpt checkpoints/768model.ckpt --config configs/stable-diffusion/v2-inference-v.yaml --H 768 --W 768
main(opt)
File "/mnt/expriment/mission/stablediffusion/scripts/txt2img.py", line 219, in main
model = load_model_from_config(config, f"{opt.ckpt}", device)
File "/mnt/expriment/mission/stablediffusion/scripts/txt2img.py", line 34, in load_model_from_config
model = instantiate_from_config(config.model)
File "/mnt/expriment/mission/stablediffusion/ldm/util.py", line 89, in instantiate_from_config
return get_obj_from_str(config["target"])(config.get("params", dict()))
File "/mnt/expriment/mission/stablediffusion/ldm/models/diffusion/ddpm.py", line 563, in init
self.instantiate_cond_stage(cond_stage_config)
File "/mnt/expriment/mission/stablediffusion/ldm/models/diffusion/ddpm.py", line 630, in instantiate_cond_stage
model = instantiate_from_config(config)
File "/mnt/expriment/mission/stablediffusion/ldm/util.py", line 89, in instantiate_from_config
return get_obj_from_str(config["target"])(config.get("params", dict()))
File "/mnt/expriment/mission/stablediffusion/ldm/util.py", line 97, in get_obj_from_str
return getattr(importlib.import_module(module, package=None), cls)
File "/root/miniconda3/envs/myconda/lib/python3.9/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1030, in _gcd_import
File "", line 1007, in _find_and_load
File "", line 986, in _find_and_load_unlocked
File "", line 680, in _load_unlocked
File "", line 850, in exec_module
File "", line 228, in _call_with_frames_removed
File "/mnt/expriment/mission/stablediffusion/ldm/modules/encoders/modules.py", line 3, in
import kornia
File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/kornia/init.py", line 10, in
from . import (
File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/kornia/augmentation/init.py", line 43, in
from .container import AugmentationSequential, ImageSequential, PatchSequential, VideoSequential
File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/kornia/augmentation/container/init.py", line 1, in
from .augment import AugmentationSequential
File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/kornia/augmentation/container/augment.py", line 17, in
from .patch import PatchSequential
File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/kornia/augmentation/container/patch.py", line 9, in
from kornia.contrib.extract_patches import extract_tensor_patches
File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/kornia/contrib/init.py", line 4, in
from .image_stitching import ImageStitcher
File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/kornia/contrib/image_stitching.py", line 7, in
from kornia.feature import LocalFeatureMatcher, LoFTR
File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/kornia/feature/init.py", line 4, in
from .integrated import (
File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/kornia/feature/integrated.py", line 52, in
class LAFDescriptor(nn.Module):
File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/kornia/feature/integrated.py", line 65, in LAFDescriptor
patch_descriptor_module: nn.Module = HardNet(True),
File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/kornia/feature/hardnet.py", line 66, in init
pretrained_dict = torch.hub.load_state_dict_from_url(
File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/torch/hub.py", line 731, in load_state_dict_from_url
return torch.load(cached_file, map_location=map_location)
File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/torch/serialization.py", line 713, in load
return _legacy_load(opened_file, map_location, pickle_module, pickle_load_args)
File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/torch/serialization.py", line 920, in _legacy_load
magic_number = pickle_module.load(f, pickle_load_args)
_pickle.UnpicklingError: invalid load key, '\x12'.
Global seed set to 42 Loading model from checkpoints/768model.ckpt Global Step: 110000 No module 'xformers'. Proceeding without it. LatentDiffusion: Running in v-prediction mode DiffusionWrapper has 865.91 M params. Keeping EMAs of 688. making attention of type 'vanilla' with 512 in_channels Working with z of shape (1, 4, 32, 32) = 4096 dimensions. making attention of type 'vanilla' with 512 in_channels Traceback (most recent call last): File "/mnt/expriment/mission/stablediffusion/scripts/txt2img.py", line 388, in