EnVision-Research / MotionInversion

Official implementation of 'Motion Inversion For Video Customization'
https://wileewang.github.io/MotionInversion/
122 stars 8 forks source link

AttributeError: 'Tensor' object has no attribute 'config' #3

Open WenshuangSong opened 7 months ago

WenshuangSong commented 7 months ago

when I run "python train.py --config ./configs/config.yaml", I got the flowing error:

File "/home/ubuntu/us/project/MotionInversion/train.py", line 463, in main(config) File "/home/ubuntu/us/project/MotionInversion/train.py", line 407, in main log_validation( File "/home/ubuntu/us/project/MotionInversion/train.py", line 84, in log_validation video_frames = pipeline( File "/home/ubuntu/anaconda3/envs/sdwebui/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "/home/ubuntu/anaconda3/envs/sdwebui/lib/python3.10/site-packages/diffusers/pipelines/text_to_video_synthesis/pipeline_text_to_video_synth.py", line 644, in call prompt_embeds, negative_prompt_embeds = self.encode_prompt( File "/home/ubuntu/anaconda3/envs/sdwebui/lib/python3.10/site-packages/diffusers/pipelines/text_to_video_synthesis/pipeline_text_to_video_synth.py", line 290, in encode_prompt prompt_embeds = self.text_encoder(text_input_ids.to(device), attention_mask=attention_mask) File "/home/ubuntu/anaconda3/envs/sdwebui/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/ubuntu/anaconda3/envs/sdwebui/lib/python3.10/site-packages/accelerate/utils/operations.py", line 581, in forward return model_forward(args, kwargs) File "/home/ubuntu/anaconda3/envs/sdwebui/lib/python3.10/site-packages/accelerate/utils/operations.py", line 569, in call return convert_to_fp32(self.model_forward(*args, *kwargs)) File "/home/ubuntu/anaconda3/envs/sdwebui/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast return func(args, **kwargs) File "/home/ubuntu/anaconda3/envs/sdwebui/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 818, in forward return_dict = return_dict if return_dict is not None else self.config.use_return_dict AttributeError: 'Tensor' object has no attribute 'config'

And my diffusers==0.26.3 transformers==4.27.4 When I print "self" in File "/home/ubuntu/anaconda3/envs/sdwebui/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 818, I found self is CLIPTextModel when not at checkpointing_steps as follows: CLIPTextModel( (text_model): CLIPTextTransformer( (embeddings): CLIPTextEmbeddings( (token_embedding): Embedding(49408, 1024) (position_embedding): Embedding(77, 1024) ) (encoder): CLIPEncoder( (layers): ModuleList( (0-22): 23 x CLIPEncoderLayer( (self_attn): CLIPAttention( (k_proj): Linear(in_features=1024, out_features=1024, bias=True) (v_proj): Linear(in_features=1024, out_features=1024, bias=True) (q_proj): Linear(in_features=1024, out_features=1024, bias=True) (out_proj): Linear(in_features=1024, out_features=1024, bias=True) ) (layer_norm1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (mlp): CLIPMLP( (activation_fn): GELUActivation() (fc1): Linear(in_features=1024, out_features=4096, bias=True) (fc2): Linear(in_features=4096, out_features=1024, bias=True) ) (layer_norm2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) ) ) (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) )

But "self" is a tensor at checkpointing_steps as follows: tensor([[49406, 320, 31777, 15939, 2528, 320, 1305, 3980, 49407, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], device='cuda:0')

wileewang commented 7 months ago

It seems to be related to the version of transformers. Actually, the latest version of transformers also works. You can have a try.

WenshuangSong commented 7 months ago

It seems to be related to the version of transformers. Actually, the latest version of transformers also works. You can have a try.

But I have try transformers-4.39.3, which also doesn't work.

wileewang commented 7 months ago

does it report the same error?

WenshuangSong commented 7 months ago

does it report the same error?

yes

WenshuangSong commented 7 months ago

does it report the same error?

after saved the checkpoint at the first checkpointing_step

wileewang commented 7 months ago

It's a little weird. You can disable checkpointing operation by setting the checkpointing_steps in config.yaml and try to load the motion embedding via the infererence code. I will check it later.

WenshuangSong commented 7 months ago

It's a little weird. You can disable checkpointing operation by setting the checkpointing_steps in config.yaml and try to load the motion embedding via the infererence code. I will check it later.

Yes,I have set checkpointing_steps: 200 and max_train_steps: 200 . But is this operation will affect the final effect ?And my results are as follows:

"A knight in armor rides a Segway",

https://github.com/EnVision-Research/MotionInversion/assets/18051580/346ce7bc-2319-4585-a8e9-925d6d70d22e

"A cat in armor driving a go-kart",

https://github.com/EnVision-Research/MotionInversion/assets/18051580/1edb071a-fc8a-42e8-ab6b-010af579ad8b

wileewang commented 7 months ago

I can't see you results

WenshuangSong commented 7 months ago

I can't see you results

"A toy train chugs around a roundabout tree" https://github.com/WenshuangSong/file/blob/main/2.mp4

"A cat in armor driving a go-kart", https://github.com/WenshuangSong/file/blob/main/1.mp4

"A knight in armor rides a Segway", https://github.com/WenshuangSong/file/blob/main/tmp_yhbhzx1.mp4

"A teddy bear is riding a tricycle in Times Square" https://github.com/WenshuangSong/file/blob/main/3.mp4

wileewang commented 7 months ago

It looks like you are not using any noise initialization strategy. The quality of video model generation strongly depends on the initial noise, which is discussed in our paper and other related literature. Since our motion embedding parameters are very limited, it is not recommended to use it alone.

Alternatively, if you wish to use motion embedding purely for video customization, you will need to update config.yaml to enlarge the size of the motion embedding by including 320 into the dim parameter and change the loss type to BaseLoss. Note that doing so also increases the risk of overfitting.

WenshuangSong commented 7 months ago

It looks like you are not using any noise initialization strategy. The quality of video model generation strongly depends on the initial noise, which is discussed in our paper and other related literature. Since our motion embedding parameters are very limited, it is not recommended to use it alone.

Alternatively, if you wish to use motion embedding purely for video customization, you will need to update config.yaml to enlarge the size of the motion embedding by including 320 into the dim parameter and change the loss type to BaseLoss. Note that doing so also increases the risk of overfitting.

Thanks for your recommendation. I have used the noise initialization strategy, and I used the training input video as input for the initialization video, which is not sure whether reasonable or not. But I still can't get a reasonable result. Here is my test result. I used the longboard-24 video as my training source video, which was also used as input for the noise initialization strategy when at the inference stage:

https://github.com/WenshuangSong/file/blob/main/longboard-24%20(1).mp4

at the inference stage, my prompt is "A pigeon is strutting around a town square", and my results are as follows: https://github.com/WenshuangSong/file/blob/main/6.mp4 https://github.com/WenshuangSong/file/blob/main/7.mp4

It doesn't seem as reasonable as the results on your project page. Did something go wrong? Thanks a lot for your reply.

wileewang commented 6 months ago

I can't check your errors based on the results alone. Were you able to successfully run the checkpoint steps in your training? It is recommended to follow this process completely for inference. Also, you can wait for us to release the online gradio demo if you still in trouble with the AttributeError.

WenshuangSong commented 6 months ago

I can't check your errors based on the results alone. Were you able to successfully run the checkpoint steps in your training? It is recommended to follow this process completely for inference. Also, you can wait for us to release the online gradio demo if you still in trouble with the AttributeError.

No, I can't successfully run the checkpoint steps in my training stage. So, when will the online gradio demo be released?Thanks~

wileewang commented 6 months ago

We will release it as soon as possible, please be patient.:)