alibaba / EasyNLP

EasyNLP: A Comprehensive and Easy-to-use NLP Toolkit
Apache License 2.0
2.03k stars 250 forks source link

RuntimeError: The size of tensor a (4608) must match the size of tensor b (6144) at non-singleton dimension 1 #335

Closed jxlinnn closed 1 year ago

jxlinnn commented 1 year ago

Hello, I have been running the DiffSynth model for Fashion Image Synthesis. However, I have been facing a mismatch between tensor sizes. I have tried changing the frame height and width of the input video and the reference image, but the error persists. I have attached my config file, and it would be great if you could help with this issue, thank you!

Preparing images for ControlNet: 100% 10/10 [00:05<00:00, 1.72it/s] Estimating pose: 100% 10/10 [00:19<00:00, 1.90s/it] Drawing pose: 100% 10/10 [00:00<00:00, 89.66it/s] Preparing images for ControlNet: 0% 0/1 [00:00<?, ?it/s]/usr/local/lib/python3.10/dist-packages/transformers/pipelines/base.py:1090: UserWarning: You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset warnings.warn( Preparing images for ControlNet: 100% 1/1 [00:00<00:00, 6.23it/s] Estimating pose: 100% 1/1 [00:01<00:00, 1.78s/it] Drawing pose: 100% 1/1 [00:00<00:00, 117.02it/s] Denoising: 0% 0/100 [00:00<?, ?it/s]Traceback (most recent call last): File "/content/drive/.shortcut-targets-by-id/1GW2UAkeoROM8fQ_lY0T7FLxc-KeENpax/EasyNLP/diffusion/DiffSynth/run_DiffSynth.py", line 92, in <module> results = pipe( File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/content/drive/.shortcut-targets-by-id/1GW2UAkeoROM8fQ_lY0T7FLxc-KeENpax/EasyNLP/diffusion/DiffSynth/DiffSynth/pipeline.py", line 370, in __call__ down_res_posi, mid_res_posi = self.controlnet( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/content/drive/.shortcut-targets-by-id/1GW2UAkeoROM8fQ_lY0T7FLxc-KeENpax/EasyNLP/diffusion/DiffSynth/DiffSynth/pipeline.py", line 54, in forward down_samples, mid_sample = controlnet( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/diffusers/models/controlnet.py", line 643, in forward sample, res_samples = downsample_block( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/diffusers/models/unet_2d_blocks.py", line 993, in forward hidden_states = attn( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/diffusers/models/transformer_2d.py", line 291, in forward hidden_states = block( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/diffusers/models/attention.py", line 176, in forward hidden_states = attn_output + hidden_states RuntimeError: The size of tensor a (4608) must match the size of tensor b (6144) at non-singleton dimension 1 Denoising: 0% 0/100 [00:00<?, ?it/s] fashion_synth.txt

Artiprocher commented 1 year ago

Please use reference_0 instead of reference_03524 in combine_pattern. The number is the id of the reference image, not the file name. Additionally, we recommand you to fine-tune the model using your own video (e.g. a video in this dataset), beause the video generated by SD 1.5 is highly unpredictable.