Open hellangleZ opened 1 year ago
When I use office demo, like cat or other pic to lego , they still work good , but only when I change my personal pic to do the generate, it releases this bug
Enviroment:
A100 Ubuntu:18.04
pip list Package Version Editable project location
accelerate 0.18.0 antlr4-python3-runtime 4.9.3 anyio 3.5.0 argon2-cffi 21.3.0 argon2-cffi-bindings 21.2.0 asttokens 2.0.5 attrs 22.1.0 Babel 2.11.0 backcall 0.2.0 beautifulsoup4 4.12.2 bleach 4.1.0 brotlipy 0.7.0 cchardet 2.1.7 certifi 2022.12.7 cffi 1.15.1 chardet 5.1.0 charset-normalizer 3.1.0 clip 1.0 /aml/CLIP-main cmake 3.26.3 comm 0.1.2 contourpy 1.0.7 cryptography 39.0.1 cycler 0.11.0 debugpy 1.5.1 decorator 5.1.1 deepfloyd-if 1.0.1 defusedxml 0.7.1 diffusers 0.16.1 entrypoints 0.4 executing 0.8.3 fastjsonschema 2.16.2 filelock 3.12.0 fonttools 4.39.3 fsspec 2023.4.0 ftfy 6.1.1 huggingface-hub 0.14.1 idna 3.4 importlib-metadata 6.6.0 ipykernel 6.19.2 ipython 8.12.0 ipython-genutils 0.2.0 ipywidgets 8.0.4 jedi 0.18.1 Jinja2 3.1.2 json5 0.9.6 jsonschema 4.17.3 jupyter 1.0.0 jupyter_client 8.1.0 jupyter-console 6.6.3 jupyter_core 5.3.0 jupyter-server 1.23.4 jupyterlab 3.5.3 jupyterlab-pygments 0.1.2 jupyterlab_server 2.22.0 jupyterlab-widgets 3.0.5 kiwisolver 1.4.4 lit 16.0.2 lxml 4.9.2 MarkupSafe 2.1.2 matplotlib 3.7.1 matplotlib-inline 0.1.6 mistune 0.8.4 mpmath 1.3.0 mypy-extensions 1.0.0 nbclassic 0.5.5 nbclient 0.5.13 nbconvert 6.5.4 nbformat 5.7.0 nest-asyncio 1.5.6 networkx 3.1 notebook 6.5.4 notebook_shim 0.2.2 numpy 1.24.3 nvidia-cublas-cu11 11.10.3.66 nvidia-cuda-cupti-cu11 11.7.101 nvidia-cuda-nvrtc-cu11 11.7.99 nvidia-cuda-runtime-cu11 11.7.99 nvidia-cudnn-cu11 8.5.0.96 nvidia-cufft-cu11 10.9.0.58 nvidia-curand-cu11 10.2.10.91 nvidia-cusolver-cu11 11.4.0.1 nvidia-cusparse-cu11 11.7.4.91 nvidia-nccl-cu11 2.14.3 nvidia-nvtx-cu11 11.7.91 omegaconf 2.3.0 packaging 23.1 pandocfilters 1.5.0 parso 0.8.3 pexpect 4.8.0 pickleshare 0.7.5 Pillow 9.5.0 pip 23.0.1 platformdirs 2.5.2 ply 3.11 prometheus-client 0.14.1 prompt-toolkit 3.0.36 protobuf 3.19.0 psutil 5.9.5 ptyprocess 0.7.0 pure-eval 0.2.2 pycparser 2.21 Pygments 2.11.2 pyOpenSSL 23.0.0 pyparsing 3.0.9 PyQt5-sip 12.11.0 pyre-extensions 0.0.29 pyrsistent 0.18.0 PySocks 1.7.1 python-dateutil 2.8.2 pytz 2022.7 PyYAML 6.0 pyzmq 25.0.2 qtconsole 5.4.2 QtPy 2.2.0 regex 2023.3.23 requests 2.29.0 safetensors 0.3.1 Send2Trash 1.8.0 sentencepiece 0.1.99 setuptools 66.0.0 sip 6.6.2 six 1.16.0 sniffio 1.2.0 soupsieve 2.4.1 stack-data 0.2.0 sympy 1.11.1 terminado 0.17.1 tinycss2 1.2.1 tokenizers 0.13.3 toml 0.10.2 tomli 2.0.1 torch 2.0.0+cu118 torchaudio 0.13.1 torchvision 0.14.1 tornado 6.2 tqdm 4.65.0 traitlets 5.7.1 transformers 4.28.1 triton 2.0.0 typing_extensions 4.5.0 typing-inspect 0.8.0 urllib3 1.26.15 wcwidth 0.2.6 webencodings 0.5.1 websocket-client 0.58.0 wheel 0.38.4 widgetsnbextension 4.0.5 xformers 0.0.19 zipp 3.15.0
same here
I wonder if it has to do with image dimensions? It seems that the support_noise tensor has a different shape than expected.
I thought so too. I left it running overnight on a custom image of the same dimensions and it worked!
Probably something dimensions related, though I tried a bunch of other things and running once more with the original dimensions just to make sure.
Just confirmed that I got the same error after not resizing -- this is definitely something to do with the resizing... Trying again after resizing an image outside of the script, which I suppose works for now...
Will try to dive into the root cause if there are some dimensions...
Under pipelines/style_transfer.py
there is an aspect ratio
Under modules/base.py
there is a _get_image_sizes
function, and thinking putting some printf's here might yield some clues.
Wondering if there might be any other sections of the source code we might want to look into?
Just to add that resizing the image beforehand also makes everything work, so it's not something with any of the python libraries (was a low chance but have encountered an issue once before). Seems that something to do with resizing/aspect ratio operations are the main suspects at the moment.
AssertionError Traceback (most recent call last) Cell In[24], line 4 1 count = 4 2 prompt = 'a boy' ----> 4 result = style_transfer( 5 t5=t5, if_I=if_I, if_II=if_II, if_III=if_III, 6 support_pil_img=zkc, 7 prompt=[prompt]*count, 8 style_prompt=[ 9 f'in style lego', 10 f'in style zombie', 11 f'in style origami', 12 f'in style anime', 13 ], 14 seed=42, 15 if_I_kwargs={ 16 "guidance_scale": 10.0, 17 "sample_timestep_respacing": "10,10,10,10,10,0,0,0,0,0", 18 'support_noise_less_qsample_steps': 5, 19 'positive_mixer': 0.8, 20 }, 21 if_II_kwargs={ 22 "guidance_scale": 4.0, 23 "sample_timestep_respacing": 'smart50', 24 "support_noise_less_qsample_steps": 5, 25 'positive_mixer': 1.0, 26 }, 27 ) 28 if_I.show(result['III'], 2, 14)
File ~/miniconda3/envs/if/lib/python3.10/site-packages/deepfloyd_if/pipelines/style_transfer.py:91, in style_transfer(t5, if_I, if_II, if_III, support_pil_img, style_prompt, prompt, negative_prompt, seed, if_I_kwargs, if_II_kwargs, if_III_kwargs, progress, return_tensors, disable_watermark) 87 if_II_kwargs['progress'] = progress 89 if_II_kwargs['support_noise'] = mid_res ---> 91 stageII_generations, _meta = if_II.embeddings_to_image(**if_II_kwargs) 92 pil_images_II = if_II.to_images(stageII_generations, disable_watermark=disable_watermark) 94 result['II'] = pil_images_II
File ~/miniconda3/envs/if/lib/python3.10/site-packages/deepfloyd_if/modules/stage_II.py:26, in IFStageII.embeddings_to_image(self, low_res, t5_embs, style_t5_embs, positive_t5_embs, negative_t5_embs, batch_repeat, aug_level, dynamic_thresholding_p, dynamic_thresholding_c, sample_loop, sample_timestep_respacing, guidance_scale, img_scale, positive_mixer, progress, seed, sample_fn, kwargs) 21 def embeddings_to_image( 22 self, low_res, t5_embs, style_t5_embs=None, positive_t5_embs=None, negative_t5_embs=None, batch_repeat=1, 23 aug_level=0.25, dynamic_thresholding_p=0.95, dynamic_thresholding_c=1.0, sample_loop='ddpm', 24 sample_timestep_respacing='smart50', guidance_scale=4.0, img_scale=4.0, positive_mixer=0.5, 25 progress=True, seed=None, sample_fn=None, kwargs): ---> 26 return super().embeddings_to_image( 27 t5_embs=t5_embs, 28 low_res=low_res, 29 style_t5_embs=style_t5_embs, 30 positive_t5_embs=positive_t5_embs, 31 negative_t5_embs=negative_t5_embs, 32 batch_repeat=batch_repeat, 33 aug_level=aug_level, 34 dynamic_thresholding_p=dynamic_thresholding_p, 35 dynamic_thresholding_c=dynamic_thresholding_c, 36 sample_loop=sample_loop, 37 sample_timestep_respacing=sample_timestep_respacing, 38 guidance_scale=guidance_scale, 39 positive_mixer=positive_mixer, 40 img_size=256, 41 img_scale=img_scale, 42 progress=progress, 43 seed=seed, 44 sample_fn=sample_fn, 45 **kwargs 46 )
File ~/miniconda3/envs/if/lib/python3.10/site-packages/deepfloyd_if/modules/base.py:181, in IFBaseModule.embeddings_to_image(self, t5_embs, low_res, style_t5_embs, positive_t5_embs, negative_t5_embs, batch_repeat, dynamic_thresholding_p, sample_loop, sample_timestep_respacing, dynamic_thresholding_c, guidance_scale, aug_level, positive_mixer, blur_sigma, img_size, img_scale, aspect_ratio, progress, seed, sample_fn, support_noise, support_noise_less_qsample_steps, inpainting_mask, **kwargs) 179 else: 180 assert support_noise_less_qsample_steps < len(diffusion.timestep_map) - 1 --> 181 assert support_noise.shape == (1, 3, image_h, image_w) 182 q_sample_steps = torch.tensor([int(len(diffusion.timestep_map) - 1 - support_noise_less_qsample_steps)]) 183 support_noise = support_noise.cpu()