Open WenshuangSong opened 7 months ago
It seems to be related to the version of transformers. Actually, the latest version of transformers also works. You can have a try.
It seems to be related to the version of transformers. Actually, the latest version of transformers also works. You can have a try.
But I have try transformers-4.39.3, which also doesn't work.
does it report the same error?
does it report the same error?
yes
does it report the same error?
after saved the checkpoint at the first checkpointing_step
It's a little weird. You can disable checkpointing operation by setting the checkpointing_steps in config.yaml and try to load the motion embedding via the infererence code. I will check it later.
It's a little weird. You can disable checkpointing operation by setting the checkpointing_steps in config.yaml and try to load the motion embedding via the infererence code. I will check it later.
Yes,I have set checkpointing_steps: 200 and max_train_steps: 200 . But is this operation will affect the final effect ?And my results are as follows:
"A knight in armor rides a Segway",
"A cat in armor driving a go-kart",
I can't see you results
I can't see you results
"A toy train chugs around a roundabout tree" https://github.com/WenshuangSong/file/blob/main/2.mp4
"A cat in armor driving a go-kart", https://github.com/WenshuangSong/file/blob/main/1.mp4
"A knight in armor rides a Segway", https://github.com/WenshuangSong/file/blob/main/tmp_yhbhzx1.mp4
"A teddy bear is riding a tricycle in Times Square" https://github.com/WenshuangSong/file/blob/main/3.mp4
It looks like you are not using any noise initialization strategy. The quality of video model generation strongly depends on the initial noise, which is discussed in our paper and other related literature. Since our motion embedding parameters are very limited, it is not recommended to use it alone.
Alternatively, if you wish to use motion embedding purely for video customization, you will need to update config.yaml to enlarge the size of the motion embedding by including 320 into the dim parameter and change the loss type to BaseLoss. Note that doing so also increases the risk of overfitting.
It looks like you are not using any noise initialization strategy. The quality of video model generation strongly depends on the initial noise, which is discussed in our paper and other related literature. Since our motion embedding parameters are very limited, it is not recommended to use it alone.
Alternatively, if you wish to use motion embedding purely for video customization, you will need to update config.yaml to enlarge the size of the motion embedding by including 320 into the dim parameter and change the loss type to BaseLoss. Note that doing so also increases the risk of overfitting.
Thanks for your recommendation. I have used the noise initialization strategy, and I used the training input video as input for the initialization video, which is not sure whether reasonable or not. But I still can't get a reasonable result. Here is my test result. I used the longboard-24 video as my training source video, which was also used as input for the noise initialization strategy when at the inference stage:
https://github.com/WenshuangSong/file/blob/main/longboard-24%20(1).mp4
at the inference stage, my prompt is "A pigeon is strutting around a town square", and my results are as follows: https://github.com/WenshuangSong/file/blob/main/6.mp4 https://github.com/WenshuangSong/file/blob/main/7.mp4
It doesn't seem as reasonable as the results on your project page. Did something go wrong? Thanks a lot for your reply.
I can't check your errors based on the results alone. Were you able to successfully run the checkpoint steps in your training? It is recommended to follow this process completely for inference. Also, you can wait for us to release the online gradio demo if you still in trouble with the AttributeError.
I can't check your errors based on the results alone. Were you able to successfully run the checkpoint steps in your training? It is recommended to follow this process completely for inference. Also, you can wait for us to release the online gradio demo if you still in trouble with the AttributeError.
No, I can't successfully run the checkpoint steps in my training stage. So, when will the online gradio demo be released?Thanks~
We will release it as soon as possible, please be patient.:)
when I run "python train.py --config ./configs/config.yaml", I got the flowing error:
File "/home/ubuntu/us/project/MotionInversion/train.py", line 463, in
main(config)
File "/home/ubuntu/us/project/MotionInversion/train.py", line 407, in main
log_validation(
File "/home/ubuntu/us/project/MotionInversion/train.py", line 84, in log_validation
video_frames = pipeline(
File "/home/ubuntu/anaconda3/envs/sdwebui/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, kwargs)
File "/home/ubuntu/anaconda3/envs/sdwebui/lib/python3.10/site-packages/diffusers/pipelines/text_to_video_synthesis/pipeline_text_to_video_synth.py", line 644, in call
prompt_embeds, negative_prompt_embeds = self.encode_prompt(
File "/home/ubuntu/anaconda3/envs/sdwebui/lib/python3.10/site-packages/diffusers/pipelines/text_to_video_synthesis/pipeline_text_to_video_synth.py", line 290, in encode_prompt
prompt_embeds = self.text_encoder(text_input_ids.to(device), attention_mask=attention_mask)
File "/home/ubuntu/anaconda3/envs/sdwebui/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, *kwargs)
File "/home/ubuntu/anaconda3/envs/sdwebui/lib/python3.10/site-packages/accelerate/utils/operations.py", line 581, in forward
return model_forward(args, kwargs)
File "/home/ubuntu/anaconda3/envs/sdwebui/lib/python3.10/site-packages/accelerate/utils/operations.py", line 569, in call
return convert_to_fp32(self.model_forward(*args, *kwargs))
File "/home/ubuntu/anaconda3/envs/sdwebui/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(args, **kwargs)
File "/home/ubuntu/anaconda3/envs/sdwebui/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 818, in forward
return_dict = return_dict if return_dict is not None else self.config.use_return_dict
AttributeError: 'Tensor' object has no attribute 'config'
And my diffusers==0.26.3 transformers==4.27.4 When I print "self" in File "/home/ubuntu/anaconda3/envs/sdwebui/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 818, I found self is CLIPTextModel when not at checkpointing_steps as follows: CLIPTextModel( (text_model): CLIPTextTransformer( (embeddings): CLIPTextEmbeddings( (token_embedding): Embedding(49408, 1024) (position_embedding): Embedding(77, 1024) ) (encoder): CLIPEncoder( (layers): ModuleList( (0-22): 23 x CLIPEncoderLayer( (self_attn): CLIPAttention( (k_proj): Linear(in_features=1024, out_features=1024, bias=True) (v_proj): Linear(in_features=1024, out_features=1024, bias=True) (q_proj): Linear(in_features=1024, out_features=1024, bias=True) (out_proj): Linear(in_features=1024, out_features=1024, bias=True) ) (layer_norm1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (mlp): CLIPMLP( (activation_fn): GELUActivation() (fc1): Linear(in_features=1024, out_features=4096, bias=True) (fc2): Linear(in_features=4096, out_features=1024, bias=True) ) (layer_norm2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) ) ) (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) )
But "self" is a tensor at checkpointing_steps as follows: tensor([[49406, 320, 31777, 15939, 2528, 320, 1305, 3980, 49407, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], device='cuda:0')