Picsart-AI-Research / StreamingT2V

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
https://streamingt2v.github.io/
1.42k stars 149 forks source link

No product output #47

Closed nijinekoyo closed 3 months ago

nijinekoyo commented 5 months ago

After executing python inference.py --prompt="A cat running on the street", there is nothing in the results folder and no error output log:

> python inference.py --prompt="A cat running on the street"
A matching Triton is not available, some optimizations will not be enabled
Traceback (most recent call last):
  File "C:\Users\xxx\anaconda3\envs\st2v\lib\site-packages\xformers\__init__.py", line 55, in _is_triton_available
    from xformers.triton.softmax import softmax as triton_softmax  # noqa
  File "C:\Users\xxx\anaconda3\envs\st2v\lib\site-packages\xformers\triton\softmax.py", line 11, in <module>
    import triton
ModuleNotFoundError: No module named 'triton'
C:\Users\xxx\anaconda3\envs\st2v\lib\site-packages\diffusers\models\transformer_temporal.py:24: FutureWarning: `TransformerTemporalModelOutput` is deprecated and will be removed in version 0.29. Importing `TransformerTemporalModelOutput` from `diffusers.models.transformer_temporal` is deprecated and this will be removed in a future version. Please use `from diffusers.models.transformers.tranformer_temporal import TransformerTemporalModelOutput`, instead.
  deprecate("TransformerTemporalModelOutput", "0.29", deprecation_message)
C:\Users\xxx\anaconda3\envs\st2v\lib\site-packages\diffusers\models\transformer_temporal.py:29: FutureWarning: `TransformerTemporalModel` is deprecated and will be removed in version 0.29. Importing `TransformerTemporalModel` from `diffusers.models.transformer_temporal` is deprecated and this will be removed in a future version. Please use `from diffusers.models.transformers.tranformer_temporal import TransformerTemporalModel`, instead.
  deprecate("TransformerTemporalModel", "0.29", deprecation_message)
C:\Users\xxx\anaconda3\envs\st2v\lib\site-packages\diffusers\models\transformer_temporal.py:34: FutureWarning: `TransformerTemporalModelOutput` is deprecated and will be removed in version 0.29. Importing `TransformerSpatioTemporalModel` from `diffusers.models.transformer_temporal` is deprecated and this will be removed in a future version. Please use `from diffusers.models.transformers.tranformer_temporal import TransformerSpatioTemporalModel`, instead.
  deprecate("TransformerTemporalModelOutput", "0.29", deprecation_message)
2024-06-11 20:10:00,402 - modelscope - INFO - PyTorch version 2.3.0+cu121 Found.
2024-06-11 20:10:00,404 - modelscope - INFO - Loading ast index from C:\Users\xxx\.cache\modelscope\ast_indexer
2024-06-11 20:10:00,507 - modelscope - INFO - Loading done! Current index file version is 1.9.0, with md5 51a5aadc1abaa95a53fd1c6559852d3e and a total number of 921 components indexed
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 5/5 [00:00<00:00,  5.43it/s]
It seems like you have activated model offloading by calling `enable_model_cpu_offload`, but are now manually moving the pipeline to GPU. It is strongly recommended against doing so as memory gains from offloading are likely to be lost. Offloading automatically takes care of moving the individual components vae, text_encoder, tokenizer, unet, scheduler to GPU when needed. To make sure offloading works as expected, you should consider moving the pipeline back to CPU: `pipeline.to('cpu')` or removing the move altogether if you use offloading.
2024-06-11 20:10:17,816 - modelscope - INFO - Use user-specified model revision: v1.1.0
2024-06-11 20:10:18,206 - modelscope - INFO - initiate model from C:\Users\xxx\.cache\modelscope\hub\damo\Video-to-Video
2024-06-11 20:10:18,207 - modelscope - INFO - initiate model from location C:\Users\xxx\.cache\modelscope\hub\damo\Video-to-Video.
2024-06-11 20:10:18,212 - modelscope - INFO - initialize model from C:\Users\xxxx\.cache\modelscope\hub\damo\Video-to-Video
2024-06-11 20:10:37,058 - modelscope - INFO - Build encoder with FrozenOpenCLIPEmbedder
2024-06-11 20:11:57,502 - modelscope - INFO - Load model Vid2VidSDUNet path C:\Users\xxxx\.cache\modelscope\hub\damo\Video-to-Video\non_ema_0035000.pth, with local status <All keys matched successfully>
2024-06-11 20:11:57,622 - modelscope - INFO - Build diffusion with type of GaussianDiffusion_SDEdit
2024-06-11 20:11:57,936 - modelscope - INFO - Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
2024-06-11 20:12:22,524 - modelscope - INFO - Restored from C:\Users\xxxx\.cache\modelscope\hub\damo\Video-to-Video\v2-1_512-ema-pruned.ckpt
2024-06-11 20:12:26,077 - modelscope - INFO - Registering forward hook for:
2024-06-11 20:12:26,092 - modelscope - INFO -           'input_blocks.0.1.0.attn1' has now 1 hooks
2024-06-11 20:12:26,094 - modelscope - INFO -           'input_blocks.0.1.0.attn2' has now 1 hooks
2024-06-11 20:12:26,102 - modelscope - INFO -           'input_blocks.1.2.0.attn1' has now 1 hooks
2024-06-11 20:12:26,103 - modelscope - INFO -           'input_blocks.1.2.0.attn2' has now 1 hooks
2024-06-11 20:12:26,106 - modelscope - INFO -           'input_blocks.2.2.0.attn1' has now 1 hooks
2024-06-11 20:12:26,106 - modelscope - INFO -           'input_blocks.2.2.0.attn2' has now 1 hooks
2024-06-11 20:12:26,108 - modelscope - INFO -           'input_blocks.4.2.0.attn1' has now 1 hooks
2024-06-11 20:12:26,108 - modelscope - INFO -           'input_blocks.4.2.0.attn2' has now 1 hooks
2024-06-11 20:12:26,110 - modelscope - INFO -           'input_blocks.5.2.0.attn1' has now 1 hooks
2024-06-11 20:12:26,111 - modelscope - INFO -           'input_blocks.5.2.0.attn2' has now 1 hooks
2024-06-11 20:12:26,113 - modelscope - INFO -           'input_blocks.7.2.0.attn1' has now 1 hooks
2024-06-11 20:12:26,113 - modelscope - INFO -           'input_blocks.7.2.0.attn2' has now 1 hooks
2024-06-11 20:12:26,115 - modelscope - INFO -           'input_blocks.8.2.0.attn1' has now 1 hooks
2024-06-11 20:12:26,116 - modelscope - INFO -           'input_blocks.8.2.0.attn2' has now 1 hooks
2024-06-11 20:12:26,118 - modelscope - INFO -           'middle_block.2.0.attn1' has now 1 hooks
2024-06-11 20:12:26,119 - modelscope - INFO -           'middle_block.2.0.attn2' has now 1 hooks
2024-06-11 20:12:26,124 - modelscope - INFO -           'output_blocks.3.2.0.attn1' has now 1 hooks
2024-06-11 20:12:26,124 - modelscope - INFO -           'output_blocks.3.2.0.attn2' has now 1 hooks
2024-06-11 20:12:26,126 - modelscope - INFO -           'output_blocks.4.2.0.attn1' has now 1 hooks
2024-06-11 20:12:26,126 - modelscope - INFO -           'output_blocks.4.2.0.attn2' has now 1 hooks
2024-06-11 20:12:26,128 - modelscope - INFO -           'output_blocks.5.2.0.attn1' has now 1 hooks
2024-06-11 20:12:26,129 - modelscope - INFO -           'output_blocks.5.2.0.attn2' has now 1 hooks
2024-06-11 20:12:26,132 - modelscope - INFO -           'output_blocks.6.2.0.attn1' has now 1 hooks
2024-06-11 20:12:26,134 - modelscope - INFO -           'output_blocks.6.2.0.attn2' has now 1 hooks
2024-06-11 20:12:26,136 - modelscope - INFO -           'output_blocks.7.2.0.attn1' has now 1 hooks
2024-06-11 20:12:26,137 - modelscope - INFO -           'output_blocks.7.2.0.attn2' has now 1 hooks
2024-06-11 20:12:26,139 - modelscope - INFO -           'output_blocks.8.2.0.attn1' has now 1 hooks
2024-06-11 20:12:26,139 - modelscope - INFO -           'output_blocks.8.2.0.attn2' has now 1 hooks
2024-06-11 20:12:26,141 - modelscope - INFO -           'output_blocks.9.2.0.attn1' has now 1 hooks
2024-06-11 20:12:26,141 - modelscope - INFO -           'output_blocks.9.2.0.attn2' has now 1 hooks
2024-06-11 20:12:26,143 - modelscope - INFO -           'output_blocks.10.2.0.attn1' has now 1 hooks
2024-06-11 20:12:26,143 - modelscope - INFO -           'output_blocks.10.2.0.attn2' has now 1 hooks
2024-06-11 20:12:26,145 - modelscope - INFO -           'output_blocks.11.2.0.attn1' has now 1 hooks
2024-06-11 20:12:26,146 - modelscope - INFO -           'output_blocks.11.2.0.attn2' has now 1 hooks
2024-06-11 20:12:27,232 - modelscope - WARNING - No preprocessor field found in cfg.
2024-06-11 20:12:27,232 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file.
2024-06-11 20:12:27,234 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': 'C:\\Users\\xxxx\.cache\\modelscope\\hub\\damo\\Video-to-Video'}. trying to build by task and model information.
2024-06-11 20:12:27,235 - modelscope - WARNING - No preprocessor key ('video-to-video-model', 'video-to-video') found in PREPROCESSOR_MAP, skip building preprocessor.
Global seed set to 33
Base pipeline from: damo-vilab/text-to-video-ms-1.7b
Pipeline class t2v_enhanced.model.model.controlnet.pipeline_text_to_video_w_controlnet_synth.TextToVideoSDPipeline
self.merging_mode attention_cross_attention
Call extend channel loader with conv_in.
Call extend channel loader with conv_out.
Some weights of UNet3DConditionModel were not initialized from the model checkpoint at damo-vilab/text-to-video-ms-1.7b and are newly initialized: ['cross_attention_merger_down_blocks.5.temporal_transformer.attention.to_k.weight', 'cross_attention_merger_mid_block.temporal_transformer.attention.to_k.weight', 'down_blocks.0.attentions.0.transformer_blocks.0.attn2.conv_ln.weight', 'cross_attention_merger_down_blocks.7.temporal_transformer.proj_in.bias', 'cross_attention_merger_down_blocks.4.temporal_transformer.proj_in.weight', 'up_blocks.1.attentions.0.transformer_blocks.0.attn2.conv.bias', 'cross_attention_merger_down_blocks.1.temporal_transformer.attention.to_v.weight', 'cross_attention_merger_down_blocks.10.temporal_transformer.attention.to_q.weight', 'cross_attention_merger_down_blocks.0.temporal_transformer.norm.bias', 'cross_attention_merger_down_blocks.7.temporal_transformer.norm.bias', 'cross_attention_merger_down_blocks.8.temporal_transformer.norm.bias', 'down_blocks.0.attentions.0.transformer_blocks.0.attn2.alpha', 'cross_attention_merger_down_blocks.10.temporal_transformer.proj_out.weight', 'cross_attention_merger_mid_block.temporal_transformer.norm.bias', 'up_blocks.2.attentions.0.transformer_blocks.0.attn2.alpha', 'cross_attention_merger_down_blocks.11.temporal_transformer.attention.to_out.0.weight', 'cross_attention_merger_down_blocks.1.temporal_transformer.norm.weight', 'up_blocks.3.attentions.0.transformer_blocks.0.attn2.conv.bias', 'cross_attention_merger_down_blocks.5.temporal_transformer.proj_out.weight', 'cross_attention_merger_down_blocks.3.temporal_transformer.attention.to_v.weight', 'cross_attention_merger_down_blocks.2.temporal_transformer.attention.to_q.weight', 'cross_attention_merger_down_blocks.2.temporal_transformer.norm.bias', 'cross_attention_merger_down_blocks.5.temporal_transformer.norm.weight', 'up_blocks.2.attentions.1.transformer_blocks.0.attn2.conv_ln.bias', 'up_blocks.3.attentions.0.transformer_blocks.0.attn2.conv_ln.bias', 'down_blocks.1.attentions.1.transformer_blocks.0.attn2.conv_ln.weight', 'cross_attention_merger_down_blocks.3.temporal_transformer.proj_in.bias', 'up_blocks.1.attentions.2.transformer_blocks.0.attn2.conv_ln.weight', 'cross_attention_merger_down_blocks.1.temporal_transformer.attention.to_q.weight', 'cross_attention_merger_down_blocks.11.temporal_transformer.norm.bias', 'cross_attention_merger_down_blocks.3.temporal_transformer.attention.to_out.0.weight', 'up_blocks.3.attentions.0.transformer_blocks.0.attn2.conv_ln.weight', 'cross_attention_merger_down_blocks.6.temporal_transformer.attention.to_out.0.weight', 'down_blocks.2.attentions.0.transformer_blocks.0.attn2.conv.weight', 'up_blocks.2.attentions.2.transformer_blocks.0.attn2.conv.weight', 'cross_attention_merger_down_blocks.6.temporal_transformer.attention.to_q.weight', 'cross_attention_merger_mid_block.temporal_transformer.norm.weight', 'cross_attention_merger_down_blocks.10.temporal_transformer.attention.to_out.0.weight', 'down_blocks.0.attentions.0.transformer_blocks.0.attn2.conv_ln.bias', 'cross_attention_merger_down_blocks.3.temporal_transformer.proj_out.weight', 'cross_attention_merger_down_blocks.8.temporal_transformer.proj_in.weight', 'cross_attention_merger_down_blocks.0.temporal_transformer.attention.to_k.weight', 'mid_block.attentions.0.transformer_blocks.0.attn2.conv.weight', 'cross_attention_merger_down_blocks.7.temporal_transformer.norm.weight', 'cross_attention_merger_down_blocks.1.temporal_transformer.proj_out.weight', 'cross_attention_merger_down_blocks.0.temporal_transformer.proj_out.weight', 'up_blocks.1.attentions.0.transformer_blocks.0.attn2.conv_ln.weight', 'up_blocks.1.attentions.1.transformer_blocks.0.attn2.conv.weight', 'cross_attention_merger_mid_block.temporal_transformer.proj_in.weight', 'up_blocks.3.attentions.0.transformer_blocks.0.attn2.conv.weight', 'up_blocks.2.attentions.2.transformer_blocks.0.attn2.conv_ln.weight', 'cross_attention_merger_mid_block.temporal_transformer.proj_out.weight', 'up_blocks.2.attentions.1.transformer_blocks.0.attn2.conv.bias', 'up_blocks.3.attentions.2.transformer_blocks.0.attn2.conv_ln.bias', 'mid_block.attentions.0.transformer_blocks.0.attn2.conv_ln.weight', 'cross_attention_merger_down_blocks.6.temporal_transformer.proj_out.weight', 'cross_attention_merger_down_blocks.10.temporal_transformer.attention.to_out.0.bias', 'cross_attention_merger_down_blocks.7.temporal_transformer.proj_out.weight', 'up_blocks.2.attentions.0.transformer_blocks.0.attn2.conv_ln.bias', 'cross_attention_merger_down_blocks.8.temporal_transformer.attention.to_v.weight', 'cross_attention_merger_down_blocks.9.temporal_transformer.attention.to_out.0.weight', 'cross_attention_merger_down_blocks.9.temporal_transformer.proj_in.bias', 'cross_attention_merger_down_blocks.5.temporal_transformer.attention.to_v.weight', 'cross_attention_merger_down_blocks.5.temporal_transformer.attention.to_out.0.weight', 'down_blocks.2.attentions.1.transformer_blocks.0.attn2.alpha', 'cross_attention_merger_down_blocks.0.temporal_transformer.attention.to_out.0.weight', 'cross_attention_merger_down_blocks.7.temporal_transformer.proj_out.bias', 'cross_attention_merger_mid_block.temporal_transformer.attention.to_q.weight', 'cross_attention_merger_down_blocks.3.temporal_transformer.attention.to_k.weight', 'up_blocks.1.attentions.1.transformer_blocks.0.attn2.alpha', 'cross_attention_merger_down_blocks.11.temporal_transformer.proj_out.bias', 'cross_attention_merger_down_blocks.1.temporal_transformer.proj_in.bias', 'cross_attention_merger_down_blocks.3.temporal_transformer.attention.to_q.weight', 'cross_attention_merger_down_blocks.3.temporal_transformer.norm.bias', 'cross_attention_merger_down_blocks.11.temporal_transformer.attention.to_out.0.bias', 'cross_attention_merger_down_blocks.6.temporal_transformer.proj_in.weight', 'cross_attention_merger_down_blocks.8.temporal_transformer.attention.to_q.weight', 'up_blocks.1.attentions.1.transformer_blocks.0.attn2.conv.bias', 'up_blocks.3.attentions.1.transformer_blocks.0.attn2.alpha', 'cross_attention_merger_down_blocks.10.temporal_transformer.proj_in.bias', 'up_blocks.1.attentions.0.transformer_blocks.0.attn2.conv_ln.bias', 'cross_attention_merger_down_blocks.7.temporal_transformer.attention.to_v.weight', 'cross_attention_merger_down_blocks.6.temporal_transformer.attention.to_k.weight', 'up_blocks.2.attentions.1.transformer_blocks.0.attn2.conv_ln.weight', 'cross_attention_merger_down_blocks.1.temporal_transformer.attention.to_out.0.bias', 'cross_attention_merger_down_blocks.2.temporal_transformer.proj_out.weight', 'cross_attention_merger_mid_block.temporal_transformer.attention.to_out.0.weight', 'cross_attention_merger_down_blocks.2.temporal_transformer.attention.to_out.0.bias', 'cross_attention_merger_down_blocks.2.temporal_transformer.norm.weight', 'cross_attention_merger_down_blocks.5.temporal_transformer.norm.bias', 'up_blocks.2.attentions.2.transformer_blocks.0.attn2.conv.bias', 'cross_attention_merger_down_blocks.7.temporal_transformer.attention.to_q.weight', 'cross_attention_merger_down_blocks.6.temporal_transformer.attention.to_out.0.bias', 'cross_attention_merger_down_blocks.11.temporal_transformer.proj_out.weight', 'up_blocks.2.attentions.2.transformer_blocks.0.attn2.alpha', 'down_blocks.1.attentions.1.transformer_blocks.0.attn2.conv.bias', 'down_blocks.1.attentions.1.transformer_blocks.0.attn2.conv_ln.bias', 'down_blocks.1.attentions.1.transformer_blocks.0.attn2.conv.weight', 'cross_attention_merger_down_blocks.9.temporal_transformer.proj_out.bias', 'cross_attention_merger_down_blocks.6.temporal_transformer.norm.bias', 'down_blocks.2.attentions.0.transformer_blocks.0.attn2.conv_ln.bias', 'cross_attention_merger_down_blocks.4.temporal_transformer.proj_out.bias', 'cross_attention_merger_down_blocks.2.temporal_transformer.attention.to_v.weight', 'cross_attention_merger_down_blocks.3.temporal_transformer.proj_out.bias', 'cross_attention_merger_mid_block.temporal_transformer.proj_in.bias', 'mid_block.attentions.0.transformer_blocks.0.attn2.conv_ln.bias', 'cross_attention_merger_down_blocks.9.temporal_transformer.attention.to_q.weight', 'mid_block.attentions.0.transformer_blocks.0.attn2.alpha', 'cross_attention_merger_down_blocks.9.temporal_transformer.attention.to_out.0.bias', 'down_blocks.1.attentions.0.transformer_blocks.0.attn2.conv_ln.weight', 'down_blocks.0.attentions.1.transformer_blocks.0.attn2.conv_ln.weight', 'up_blocks.1.attentions.0.transformer_blocks.0.attn2.alpha', 'cross_attention_merger_down_blocks.1.temporal_transformer.proj_out.bias', 'cross_attention_merger_down_blocks.2.temporal_transformer.attention.to_k.weight', 'cross_attention_merger_down_blocks.11.temporal_transformer.proj_in.weight', 'cross_attention_merger_down_blocks.6.temporal_transformer.norm.weight', 'up_blocks.3.attentions.2.transformer_blocks.0.attn2.conv_ln.weight', 'cross_attention_merger_down_blocks.5.temporal_transformer.proj_in.weight', 'cross_attention_merger_down_blocks.5.temporal_transformer.proj_out.bias', 'down_blocks.0.attentions.1.transformer_blocks.0.attn2.conv_ln.bias', 'down_blocks.2.attentions.1.transformer_blocks.0.attn2.conv.weight', 'cross_attention_merger_down_blocks.11.temporal_transformer.attention.to_q.weight', 'up_blocks.1.attentions.2.transformer_blocks.0.attn2.conv.weight', 'up_blocks.3.attentions.0.transformer_blocks.0.attn2.alpha', 'cross_attention_merger_down_blocks.11.temporal_transformer.attention.to_v.weight', 'down_blocks.0.attentions.1.transformer_blocks.0.attn2.conv.bias', 'cross_attention_merger_down_blocks.3.temporal_transformer.attention.to_out.0.bias', 'cross_attention_merger_down_blocks.10.temporal_transformer.norm.bias', 'cross_attention_merger_down_blocks.0.temporal_transformer.attention.to_out.0.bias', 'cross_attention_merger_mid_block.temporal_transformer.attention.to_out.0.bias', 'cross_attention_merger_down_blocks.7.temporal_transformer.attention.to_out.0.bias', 'cross_attention_merger_down_blocks.6.temporal_transformer.proj_out.bias', 'cross_attention_merger_down_blocks.1.temporal_transformer.attention.to_out.0.weight', 'cross_attention_merger_down_blocks.4.temporal_transformer.norm.weight', 'up_blocks.1.attentions.2.transformer_blocks.0.attn2.alpha', 'down_blocks.2.attentions.1.transformer_blocks.0.attn2.conv.bias', 'cross_attention_merger_down_blocks.11.temporal_transformer.norm.weight', 'cross_attention_merger_down_blocks.3.temporal_transformer.norm.weight', 'cross_attention_merger_down_blocks.9.temporal_transformer.norm.bias', 'cross_attention_merger_down_blocks.8.temporal_transformer.attention.to_out.0.weight', 'cross_attention_merger_down_blocks.8.temporal_transformer.attention.to_out.0.bias', 'cross_attention_merger_down_blocks.7.temporal_transformer.attention.to_out.0.weight', 'cross_attention_merger_down_blocks.4.temporal_transformer.attention.to_out.0.weight', 'down_blocks.0.attentions.1.transformer_blocks.0.attn2.alpha', 'cross_attention_merger_down_blocks.6.temporal_transformer.proj_in.bias', 'up_blocks.2.attentions.1.transformer_blocks.0.attn2.conv.weight', 'cross_attention_merger_down_blocks.4.temporal_transformer.proj_in.bias', 'up_blocks.3.attentions.1.transformer_blocks.0.attn2.conv_ln.bias', 'down_blocks.0.attentions.1.transformer_blocks.0.attn2.conv.weight', 'up_blocks.3.attentions.2.transformer_blocks.0.attn2.conv.bias', 'up_blocks.3.attentions.2.transformer_blocks.0.attn2.alpha', 'down_blocks.2.attentions.1.transformer_blocks.0.attn2.conv_ln.weight', 'up_blocks.3.attentions.1.transformer_blocks.0.attn2.conv.weight', 'cross_attention_merger_down_blocks.10.temporal_transformer.proj_out.bias', 'cross_attention_merger_down_blocks.4.temporal_transformer.norm.bias', 'down_blocks.1.attentions.0.transformer_blocks.0.attn2.conv_ln.bias', 'cross_attention_merger_down_blocks.0.temporal_transformer.proj_in.bias', 'cross_attention_merger_down_blocks.9.temporal_transformer.proj_out.weight', 'cross_attention_merger_down_blocks.2.temporal_transformer.proj_out.bias', 'up_blocks.3.attentions.1.transformer_blocks.0.attn2.conv_ln.weight', 'cross_attention_merger_down_blocks.8.temporal_transformer.proj_out.weight', 'up_blocks.1.attentions.0.transformer_blocks.0.attn2.conv.weight', 'up_blocks.1.attentions.2.transformer_blocks.0.attn2.conv.bias', 'cross_attention_merger_down_blocks.1.temporal_transformer.norm.bias', 'cross_attention_merger_down_blocks.11.temporal_transformer.attention.to_k.weight', 'down_blocks.2.attentions.1.transformer_blocks.0.attn2.conv_ln.bias', 'cross_attention_merger_down_blocks.2.temporal_transformer.proj_in.bias', 'cross_attention_merger_down_blocks.8.temporal_transformer.proj_in.bias', 'cross_attention_merger_down_blocks.10.temporal_transformer.attention.to_k.weight', 'cross_attention_merger_down_blocks.4.temporal_transformer.attention.to_v.weight', 'cross_attention_merger_down_blocks.10.temporal_transformer.attention.to_v.weight', 'cross_attention_merger_down_blocks.4.temporal_transformer.attention.to_q.weight', 'down_blocks.2.attentions.0.transformer_blocks.0.attn2.conv.bias', 'up_blocks.1.attentions.1.transformer_blocks.0.attn2.conv_ln.weight', 'down_blocks.0.attentions.0.transformer_blocks.0.attn2.conv.bias', 'cross_attention_merger_down_blocks.8.temporal_transformer.attention.to_k.weight', 'up_blocks.3.attentions.1.transformer_blocks.0.attn2.conv.bias', 'cross_attention_merger_down_blocks.9.temporal_transformer.norm.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn2.conv.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn2.conv.bias', 'cross_attention_merger_down_blocks.5.temporal_transformer.attention.to_q.weight', 'cross_attention_merger_down_blocks.11.temporal_transformer.proj_in.bias', 'cross_attention_merger_down_blocks.0.temporal_transformer.norm.weight', 'cross_attention_merger_down_blocks.8.temporal_transformer.norm.weight', 'cross_attention_merger_down_blocks.3.temporal_transformer.proj_in.weight', 'cross_attention_merger_down_blocks.8.temporal_transformer.proj_out.bias', 'down_blocks.0.attentions.0.transformer_blocks.0.attn2.conv.weight', 'up_blocks.2.attentions.0.transformer_blocks.0.attn2.conv.weight', 'mid_block.attentions.0.transformer_blocks.0.attn2.conv.bias', 'cross_attention_merger_down_blocks.7.temporal_transformer.proj_in.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn2.alpha', 'cross_attention_merger_down_blocks.4.temporal_transformer.attention.to_out.0.bias', 'down_blocks.2.attentions.0.transformer_blocks.0.attn2.alpha', 'cross_attention_merger_down_blocks.5.temporal_transformer.proj_in.bias', 'down_blocks.1.attentions.1.transformer_blocks.0.attn2.alpha', 'cross_attention_merger_down_blocks.9.temporal_transformer.attention.to_k.weight', 'cross_attention_merger_down_blocks.2.temporal_transformer.proj_in.weight', 'up_blocks.1.attentions.1.transformer_blocks.0.attn2.conv_ln.bias', 'up_blocks.2.attentions.0.transformer_blocks.0.attn2.conv.bias', 'cross_attention_merger_down_blocks.4.temporal_transformer.attention.to_k.weight', 'up_blocks.3.attentions.2.transformer_blocks.0.attn2.conv.weight', 'up_blocks.2.attentions.0.transformer_blocks.0.attn2.conv_ln.weight', 'cross_attention_merger_mid_block.temporal_transformer.attention.to_v.weight', 'cross_attention_merger_down_blocks.2.temporal_transformer.attention.to_out.0.weight', 'cross_attention_merger_down_blocks.7.temporal_transformer.attention.to_k.weight', 'cross_attention_merger_down_blocks.0.temporal_transformer.proj_out.bias', 'cross_attention_merger_down_blocks.0.temporal_transformer.proj_in.weight', 'cross_attention_merger_mid_block.temporal_transformer.proj_out.bias', 'cross_attention_merger_down_blocks.4.temporal_transformer.proj_out.weight', 'cross_attention_merger_down_blocks.6.temporal_transformer.attention.to_v.weight', 'cross_attention_merger_down_blocks.10.temporal_transformer.proj_in.weight', 'cross_attention_merger_down_blocks.9.temporal_transformer.proj_in.weight', 'up_blocks.2.attentions.2.transformer_blocks.0.attn2.conv_ln.bias', 'cross_attention_merger_down_blocks.0.temporal_transformer.attention.to_v.weight', 'cross_attention_merger_down_blocks.10.temporal_transformer.norm.weight', 'cross_attention_merger_down_blocks.0.temporal_transformer.attention.to_q.weight', 'cross_attention_merger_down_blocks.1.temporal_transformer.proj_in.weight', 'cross_attention_merger_down_blocks.1.temporal_transformer.attention.to_k.weight', 'down_blocks.2.attentions.0.transformer_blocks.0.attn2.conv_ln.weight', 'cross_attention_merger_down_blocks.9.temporal_transformer.attention.to_v.weight', 'up_blocks.2.attentions.1.transformer_blocks.0.attn2.alpha', 'cross_attention_merger_down_blocks.5.temporal_transformer.attention.to_out.0.bias', 'up_blocks.1.attentions.2.transformer_blocks.0.attn2.conv_ln.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
PIPE LOADING DONE
CUSTOM XFORMERS ATTENTION USED.
Trainer already configured with model summary callbacks: [<class 'pytorch_lightning.callbacks.rich_model_summary.RichModelSummary'>]. Skipping setting a default `ModelSummary` callback.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs