Tencent / MimicMotion

High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
https://tencent.github.io/MimicMotion/
Other
1.59k stars 126 forks source link

About training the MimicMotion model #55

Open wtjiang98 opened 1 month ago

wtjiang98 commented 1 month ago

I am trying to implement the training code based on https://github.com/pixeli99/SVD_Xtend/blob/main/train_svd.py. Because I find that the inference code of MimicMotion is quite similar with SVD_Xtend.

However, I try to train the model and got the following blurry results. I used the 8bitOpt and turn on the gradient checkpointing, set the num_frames to 12. Welcome the discussion if you are also trying to implement the training code or already finished it. Thank you so much.

image

RecordK commented 1 month ago

Would it be possible for you to kindly share the code you have implemented?

Bowen-Jian commented 1 month ago

Hello, I am also trying to modify this project to train the MimicMotion model. If possible, could you upload your training code to GitHub so that we can discuss and debug it together?

Bowen-Jian commented 1 month ago

Another project you can refer to is https://github.com/MooreThreads/Moore-AnimateAnyone. One of the modifications is replacing the network architecture with Unet.

zyh-uaiaaaa commented 1 month ago

I got a similar problem, but if I use the [train_svd.py] to only train SVD, it's all fine. The results got bad after adding the posenet, any solutions?

wtjiang98 commented 1 month ago

In fact, a simple way to validate is to continue training on their pre-trained model to verify the correctness of your code logic. That's what I did. Then I retrained this model, and after a few epochs, I almost reproduced the model with slightly less clarity. I encountered and resolved many difficulties during the reproduction process. Therefore, if you share your code, I can point out where the problems might be.

Thank you very much for your kindness. But the code is not allowed to shared due to my affiliation. An observation is that it is quite good when I training using one GPU, but produce blue blurry results on more than one GPU. I am still trying to solve it.

Jie-zju commented 1 month ago

In fact, a simple way to validate is to continue training on their pre-trained model to verify the correctness of your code logic. That's what I did. Then I retrained this model, and after a few epochs, I almost reproduced the model with slightly less clarity. I encountered and resolved many difficulties during the reproduction process. Therefore, if you share your code, I can point out where the problems might be.

Thank you very much for your kindness. But the code is not allowed to shared due to my affiliation. An observation is that it is quite good when I training using one GPU, but produce blue blurry results on more than one GPU. I am still trying to solve it.

Try to load a pretrained model from mimicmotion to go on training and inference only on mimicmotion may be good.

adf1178 commented 1 month ago

Hello! We met similar problems that generated videos become blue blurs. Could you please tell me what version of VAE are you using?

wtjiang98 commented 1 month ago

Hello! We met similar problems that generated videos become blue blurs. Could you please tell me what version of VAE are you using?

I do not change the model structure of model. I use the VAE from Stable Video Diffusion

wtjiang98 commented 1 month ago

In fact, a simple way to validate is to continue training on their pre-trained model to verify the correctness of your code logic. That's what I did. Then I retrained this model, and after a few epochs, I almost reproduced the model with slightly less clarity. I encountered and resolved many difficulties during the reproduction process. Therefore, if you share your code, I can point out where the problems might be.

Thank you very much for your kindness. But the code is not allowed to shared due to my affiliation. An observation is that it is quite good when I training using one GPU, but produce blue blurry results on more than one GPU. I am still trying to solve it.

Hey! I solve this problem by using the correct way of accelerate.unwarp(). The blue burry results are produced since I use accelerate.unwarp(mimicmotion_model). It seems that the unwarp function do not applied on the submodules. Thus, using accelerate.unwarp(mimicmotion_model.unet); accelerate.unwarp(mimicmotion_model.posenet) works for me.