Tencent / HunyuanDiT

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
https://dit.hunyuan.tencent.com/
Other
2.68k stars 197 forks source link

🍊 Jupyter Notebook #126

Closed camenduru closed 4 days ago

camenduru commented 5 days ago

Thanks for the project ❤️ I made a jupyter notebook 🥳 I hope you like it.

https://github.com/camenduru/HunyuanDiT-jupyter

camenduru commented 5 days ago

Hi 👋 @Jarvis73

I am getting this error with the new model: https://huggingface.co/Tencent-Hunyuan/HunyuanDiT-v1.2

/content/HunyuanDiT
2024-06-30 17:02:33.728105: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-06-30 17:02:33.728163: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-06-30 17:02:33.729733: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-06-30 17:02:33.737713: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-30 17:02:34.956420: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
flash_attn import failed: No module named 'flash_attn'
[2024-06-30 17:02:38,110] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
 [WARNING]  NVIDIA Inference is only supported on Ampere and newer architectures
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
 [WARNING]  using untested triton version (2.3.0), only 1.0.0 is known to be compatible
2024-06-30 17:02:39.955 | INFO     | hydit.inference:__init__:160 - Got text-to-image model root path: ckpts/t2i
2024-06-30 17:02:39.955 | INFO     | hydit.inference:__init__:169 - Loading CLIP Text Encoder...
2024-06-30 17:02:47.315 | INFO     | hydit.inference:__init__:172 - Loading CLIP Text Encoder finished
2024-06-30 17:02:47.315 | INFO     | hydit.inference:__init__:175 - Loading CLIP Tokenizer...
2024-06-30 17:02:47.369 | INFO     | hydit.inference:__init__:178 - Loading CLIP Tokenizer finished
2024-06-30 17:02:47.369 | INFO     | hydit.inference:__init__:181 - Loading T5 Text Encoder and T5 Tokenizer...
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
/usr/local/lib/python3.10/dist-packages/transformers/convert_slow_tokenizer.py:560: UserWarning: The sentencepiece tokenizer that you are converting to a fast tokenizer uses the byte fallback option which is not implemented in the fast tokenizers. In practice this means that the fast version of the tokenizer can produce unknown tokens whereas the sentencepiece version would have converted these unknown tokens into a sequence of byte tokens matching the original piece of text.
  warnings.warn(
You are using a model of type mt5 to instantiate a model of type t5. This is not supported for all configurations of models and can yield errors.
2024-06-30 17:03:25.810 | INFO     | hydit.inference:__init__:185 - Loading t5_text_encoder and t5_tokenizer finished
2024-06-30 17:03:25.810 | INFO     | hydit.inference:__init__:188 - Loading VAE...
2024-06-30 17:03:27.499 | INFO     | hydit.inference:__init__:191 - Loading VAE finished
2024-06-30 17:03:27.499 | INFO     | hydit.inference:__init__:195 - Building HunYuan-DiT model...
2024-06-30 17:03:28.110 | INFO     | hydit.modules.models:__init__:239 -     Number of tokens: 4096
2024-06-30 17:03:52.716 | INFO     | hydit.inference:__init__:216 - Loading torch model ckpts/t2i/model/pytorch_model_ema.pt...
Traceback (most recent call last):
  File "/content/HunyuanDiT/sample_t2i.py", line 31, in <module>
    args, gen, enhancer = inferencer()
  File "/content/HunyuanDiT/sample_t2i.py", line 17, in inferencer
    gen = End2End(args, models_root_path)
  File "/content/HunyuanDiT/hydit/inference.py", line 218, in __init__
    self.model.load_state_dict(state_dict)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2189, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for HunYuanDiT:
    Missing key(s) in state_dict: "style_embedder.weight". 
    size mismatch for extra_embedder.0.weight: copying a param with shape torch.Size([5632, 1024]) from checkpoint, the shape in current model is torch.Size([5632, 3968]).
C0nsumption commented 5 days ago

Hi 👋 @Jarvis73

I am getting this error with the new model: https://huggingface.co/Tencent-Hunyuan/HunyuanDiT-v1.2

/content/HunyuanDiT
2024-06-30 17:02:33.728105: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-06-30 17:02:33.728163: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-06-30 17:02:33.729733: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-06-30 17:02:33.737713: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-30 17:02:34.956420: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
flash_attn import failed: No module named 'flash_attn'
[2024-06-30 17:02:38,110] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
 [WARNING]  NVIDIA Inference is only supported on Ampere and newer architectures
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
 [WARNING]  using untested triton version (2.3.0), only 1.0.0 is known to be compatible
2024-06-30 17:02:39.955 | INFO     | hydit.inference:__init__:160 - Got text-to-image model root path: ckpts/t2i
2024-06-30 17:02:39.955 | INFO     | hydit.inference:__init__:169 - Loading CLIP Text Encoder...
2024-06-30 17:02:47.315 | INFO     | hydit.inference:__init__:172 - Loading CLIP Text Encoder finished
2024-06-30 17:02:47.315 | INFO     | hydit.inference:__init__:175 - Loading CLIP Tokenizer...
2024-06-30 17:02:47.369 | INFO     | hydit.inference:__init__:178 - Loading CLIP Tokenizer finished
2024-06-30 17:02:47.369 | INFO     | hydit.inference:__init__:181 - Loading T5 Text Encoder and T5 Tokenizer...
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
/usr/local/lib/python3.10/dist-packages/transformers/convert_slow_tokenizer.py:560: UserWarning: The sentencepiece tokenizer that you are converting to a fast tokenizer uses the byte fallback option which is not implemented in the fast tokenizers. In practice this means that the fast version of the tokenizer can produce unknown tokens whereas the sentencepiece version would have converted these unknown tokens into a sequence of byte tokens matching the original piece of text.
  warnings.warn(
You are using a model of type mt5 to instantiate a model of type t5. This is not supported for all configurations of models and can yield errors.
2024-06-30 17:03:25.810 | INFO     | hydit.inference:__init__:185 - Loading t5_text_encoder and t5_tokenizer finished
2024-06-30 17:03:25.810 | INFO     | hydit.inference:__init__:188 - Loading VAE...
2024-06-30 17:03:27.499 | INFO     | hydit.inference:__init__:191 - Loading VAE finished
2024-06-30 17:03:27.499 | INFO     | hydit.inference:__init__:195 - Building HunYuan-DiT model...
2024-06-30 17:03:28.110 | INFO     | hydit.modules.models:__init__:239 -     Number of tokens: 4096
2024-06-30 17:03:52.716 | INFO     | hydit.inference:__init__:216 - Loading torch model ckpts/t2i/model/pytorch_model_ema.pt...
Traceback (most recent call last):
  File "/content/HunyuanDiT/sample_t2i.py", line 31, in <module>
    args, gen, enhancer = inferencer()
  File "/content/HunyuanDiT/sample_t2i.py", line 17, in inferencer
    gen = End2End(args, models_root_path)
  File "/content/HunyuanDiT/hydit/inference.py", line 218, in __init__
    self.model.load_state_dict(state_dict)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2189, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for HunYuanDiT:
  Missing key(s) in state_dict: "style_embedder.weight". 
  size mismatch for extra_embedder.0.weight: copying a param with shape torch.Size([5632, 1024]) from checkpoint, the shape in current model is torch.Size([5632, 3968]).

Same, any luck? Seems like something is probably missing from the repo maybe? (the huggingface one)

zml-ai commented 4 days ago

The v1.2 code, currently under internal review, will be released soon. The updates involve minor architectural modifications: removal of size cond and style embedder, with beta end adjusted to 0.018.

camenduru commented 4 days ago

thanks ❤