Incompatible dimensions

DataJuggler commented 3 weeks ago

I ran into numerous problems getting this installed.

First, I think your documentation left out the creating Models folder. I found this in sd3_infer.py

# NOTE: Must have folder `models` with the following files:
# - `clip_g.safetensors` (openclip bigG, same as SDXL)
# - `clip_l.safetensors` (OpenAI CLIP-L, same as SDXL)
# - `t5xxl.safetensors` (google T5-v1.1-XXL)
# - `sd3_medium.safetensors` (or whichever main MMDiT model file)
# Also can have
# - `sd3_vae.safetensors` (holds the VAE separately if needed)

Also, to get to this work, I had to install these two

pip install fire safetensors tqdm einops transformers sentencepiece protobuf pillow

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

I think your link to the t5xxl.safetensors file is wrong or your Python is wrong. I downloaded the file from Hugging Face, and the file had the name t5xxl_F16.safetensors. The app was looking for t5xxl.safetensors. I renamed the file without the F16, and I got to the Models Loaded point.

Then it started generating the images, took a long time and posted this:

(.sd3.5) E:\SD35Turbo.sd3.5\sd3.5>python sd3_infer.py --prompt "cute picture of a dog" --model E:\SD35Turbo\sd3.5_large_turbo.safetensors --width 1920 --height 1080 Loading tokenizers... You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 Loading OpenAI CLIP L... Loading OpenCLIP bigG... Loading Google T5-v1-XXL... Skipping key 'shared.weight' in safetensors file as 'shared' does not exist in python model Loading SD3 model sd3.5_large_turbo.safetensors... Loading VAE model... Models loaded. Saving images to outputs\sd3.5_large_turbo\cute picture of a dog_2024-11-01T08-58-20 0%| | 0/4 [00:04<?, ?it/s] 0%| | 0/1 [01:40<?, ?it/s] Traceback (most recent call last): File "E:\SD35Turbo.sd3.5\sd3.5\sd3_infer.py", line 481, in fire.Fire(main) File "E:\SD35Turbo.sd3.5\sd3.5.sd3.5\Lib\site-packages\fire\core.py", line 135, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\SD35Turbo.sd3.5\sd3.5.sd3.5\Lib\site-packages\fire\core.py", line 468, in _Fire component, remaining_args = _CallAndUpdateTrace( ^^^^^^^^^^^^^^^^^^^^ File "E:\SD35Turbo.sd3.5\sd3.5.sd3.5\Lib\site-packages\fire\core.py", line 684, in _CallAndUpdateTrace component = fn(*varargs, kwargs) ^^^^^^^^^^^^^^^^^^^^^^ File "E:\SD35Turbo.sd3.5\sd3.5.sd3.5\Lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context return func(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "E:\SD35Turbo.sd3.5\sd3.5\sd3_infer.py", line 465, in main inferencer.gen_image( File "E:\SD35Turbo.sd3.5\sd3.5\sd3_infer.py", line 358, in gen_image sampled_latent = self.do_sampling( ^^^^^^^^^^^^^^^^^ File "E:\SD35Turbo.sd3.5\sd3.5\sd3_infer.py", line 286, in do_sampling latent = sample_fn( ^^^^^^^^^^ File "E:\SD35Turbo.sd3.5\sd3.5.sd3.5\Lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context return func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "E:\SD35Turbo.sd3.5\sd3.5.sd3.5\Lib\site-packages\torch\amp\autocast_mode.py", line 44, in decorate_autocast return func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "E:\SD35Turbo.sd3.5\sd3.5\sd3_impls.py", line 285, in sample_euler denoised = model(x, sigma_hat * s_in, *extra_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\SD35Turbo.sd3.5\sd3.5.sd3.5\Lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl return self._call_impl(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\SD35Turbo.sd3.5\sd3.5.sd3.5\Lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\SD35Turbo.sd3.5\sd3.5\sd3_impls.py", line 151, in forward batched = self.model.apply_model( ^^^^^^^^^^^^^^^^^^^^^^^ File "E:\SD35Turbo.sd3.5\sd3.5\sd3_impls.py", line 126, in apply_model return self.model_sampling.calculate_denoised(sigma, model_output, x) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\SD35Turbo.sd3.5\sd3.5\sd3_impls.py", line 47, in calculate_denoised return model_input - model_output sigma


RuntimeError: The size of tensor a (135) must match the size of tensor b (134) at non-singleton dimension 2

-- 

Any suggestions on how to fix this, or did I do something wrong?

Thanks

ysxk commented 2 days ago

I encountered the same problem, have you solved it

DataJuggler commented 2 days ago

No. I think I gave up.

dch0319 commented 1 day ago

same problem

Stability-AI / sd3.5

Incompatible dimensions #13