I ran into numerous problems getting this installed.
First, I think your documentation left out the creating Models folder. I found this in sd3_infer.py
# NOTE: Must have folder `models` with the following files:
# - `clip_g.safetensors` (openclip bigG, same as SDXL)
# - `clip_l.safetensors` (OpenAI CLIP-L, same as SDXL)
# - `t5xxl.safetensors` (google T5-v1.1-XXL)
# - `sd3_medium.safetensors` (or whichever main MMDiT model file)
# Also can have
# - `sd3_vae.safetensors` (holds the VAE separately if needed)
Also, to get to this work, I had to install these two
pip install fire safetensors tqdm einops transformers sentencepiece protobuf pillow
I think your link to the t5xxl.safetensors file is wrong or your Python is wrong. I downloaded the file from Hugging Face, and the file had the name t5xxl_F16.safetensors. The app was looking for t5xxl.safetensors. I renamed the file without the F16, and I got to the Models Loaded point.
Then it started generating the images, took a long time and posted this:
(.sd3.5) E:\SD35Turbo.sd3.5\sd3.5>python sd3_infer.py --prompt "cute picture of a dog" --model E:\SD35Turbo\sd3.5_large_turbo.safetensors --width 1920 --height 1080
Loading tokenizers...
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Loading OpenAI CLIP L...
Loading OpenCLIP bigG...
Loading Google T5-v1-XXL...
Skipping key 'shared.weight' in safetensors file as 'shared' does not exist in python model
Loading SD3 model sd3.5_large_turbo.safetensors...
Loading VAE model...
Models loaded.
Saving images to outputs\sd3.5_large_turbo\cute picture of a dog_2024-11-01T08-58-20
0%| | 0/4 [00:04<?, ?it/s]
0%| | 0/1 [01:40<?, ?it/s]
Traceback (most recent call last):
File "E:\SD35Turbo.sd3.5\sd3.5\sd3_infer.py", line 481, in
fire.Fire(main)
File "E:\SD35Turbo.sd3.5\sd3.5.sd3.5\Lib\site-packages\fire\core.py", line 135, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\SD35Turbo.sd3.5\sd3.5.sd3.5\Lib\site-packages\fire\core.py", line 468, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "E:\SD35Turbo.sd3.5\sd3.5.sd3.5\Lib\site-packages\fire\core.py", line 684, in _CallAndUpdateTrace
component = fn(*varargs, kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "E:\SD35Turbo.sd3.5\sd3.5.sd3.5\Lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context
return func(*args, *kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "E:\SD35Turbo.sd3.5\sd3.5\sd3_infer.py", line 465, in main
inferencer.gen_image(
File "E:\SD35Turbo.sd3.5\sd3.5\sd3_infer.py", line 358, in gen_image
sampled_latent = self.do_sampling(
^^^^^^^^^^^^^^^^^
File "E:\SD35Turbo.sd3.5\sd3.5\sd3_infer.py", line 286, in do_sampling
latent = sample_fn(
^^^^^^^^^^
File "E:\SD35Turbo.sd3.5\sd3.5.sd3.5\Lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context
return func(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "E:\SD35Turbo.sd3.5\sd3.5.sd3.5\Lib\site-packages\torch\amp\autocast_mode.py", line 44, in decorate_autocast
return func(*args, kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "E:\SD35Turbo.sd3.5\sd3.5\sd3_impls.py", line 285, in sample_euler
denoised = model(x, sigma_hat * s_in, *extra_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\SD35Turbo.sd3.5\sd3.5.sd3.5\Lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\SD35Turbo.sd3.5\sd3.5.sd3.5\Lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, *kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\SD35Turbo.sd3.5\sd3.5\sd3_impls.py", line 151, in forward
batched = self.model.apply_model(
^^^^^^^^^^^^^^^^^^^^^^^
File "E:\SD35Turbo.sd3.5\sd3.5\sd3_impls.py", line 126, in apply_model
return self.model_sampling.calculate_denoised(sigma, model_output, x)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\SD35Turbo.sd3.5\sd3.5\sd3_impls.py", line 47, in calculate_denoised
return model_input - model_output sigma
RuntimeError: The size of tensor a (135) must match the size of tensor b (134) at non-singleton dimension 2
--
Any suggestions on how to fix this, or did I do something wrong?
Thanks
I ran into numerous problems getting this installed.
First, I think your documentation left out the creating Models folder. I found this in sd3_infer.py
Also, to get to this work, I had to install these two
pip install fire safetensors tqdm einops transformers sentencepiece protobuf pillow
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
I think your link to the t5xxl.safetensors file is wrong or your Python is wrong. I downloaded the file from Hugging Face, and the file had the name t5xxl_F16.safetensors. The app was looking for t5xxl.safetensors. I renamed the file without the F16, and I got to the Models Loaded point.
Then it started generating the images, took a long time and posted this:
(.sd3.5) E:\SD35Turbo.sd3.5\sd3.5>python sd3_infer.py --prompt "cute picture of a dog" --model E:\SD35Turbo\sd3.5_large_turbo.safetensors --width 1920 --height 1080 Loading tokenizers... You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the
fire.Fire(main)
File "E:\SD35Turbo.sd3.5\sd3.5.sd3.5\Lib\site-packages\fire\core.py", line 135, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\SD35Turbo.sd3.5\sd3.5.sd3.5\Lib\site-packages\fire\core.py", line 468, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "E:\SD35Turbo.sd3.5\sd3.5.sd3.5\Lib\site-packages\fire\core.py", line 684, in _CallAndUpdateTrace
component = fn(*varargs, kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "E:\SD35Turbo.sd3.5\sd3.5.sd3.5\Lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context
return func(*args, *kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "E:\SD35Turbo.sd3.5\sd3.5\sd3_infer.py", line 465, in main
inferencer.gen_image(
File "E:\SD35Turbo.sd3.5\sd3.5\sd3_infer.py", line 358, in gen_image
sampled_latent = self.do_sampling(
^^^^^^^^^^^^^^^^^
File "E:\SD35Turbo.sd3.5\sd3.5\sd3_infer.py", line 286, in do_sampling
latent = sample_fn(
^^^^^^^^^^
File "E:\SD35Turbo.sd3.5\sd3.5.sd3.5\Lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context
return func(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "E:\SD35Turbo.sd3.5\sd3.5.sd3.5\Lib\site-packages\torch\amp\autocast_mode.py", line 44, in decorate_autocast
return func(*args, kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "E:\SD35Turbo.sd3.5\sd3.5\sd3_impls.py", line 285, in sample_euler
denoised = model(x, sigma_hat * s_in, *extra_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\SD35Turbo.sd3.5\sd3.5.sd3.5\Lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\SD35Turbo.sd3.5\sd3.5.sd3.5\Lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, *kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\SD35Turbo.sd3.5\sd3.5\sd3_impls.py", line 151, in forward
batched = self.model.apply_model(
^^^^^^^^^^^^^^^^^^^^^^^
File "E:\SD35Turbo.sd3.5\sd3.5\sd3_impls.py", line 126, in apply_model
return self.model_sampling.calculate_denoised(sigma, model_output, x)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\SD35Turbo.sd3.5\sd3.5\sd3_impls.py", line 47, in calculate_denoised
return model_input - model_output sigma
legacy
(previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, setlegacy=False
. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 Loading OpenAI CLIP L... Loading OpenCLIP bigG... Loading Google T5-v1-XXL... Skipping key 'shared.weight' in safetensors file as 'shared' does not exist in python model Loading SD3 model sd3.5_large_turbo.safetensors... Loading VAE model... Models loaded. Saving images to outputs\sd3.5_large_turbo\cute picture of a dog_2024-11-01T08-58-20 0%| | 0/4 [00:04<?, ?it/s] 0%| | 0/1 [01:40<?, ?it/s] Traceback (most recent call last): File "E:\SD35Turbo.sd3.5\sd3.5\sd3_infer.py", line 481, in