Tencent / MimicMotion

High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
https://tencent.github.io/MimicMotion/
Other
1.93k stars 165 forks source link

About training the MimicMotion model #55

Open wtjiang98 opened 4 months ago

wtjiang98 commented 4 months ago

I am trying to implement the training code based on https://github.com/pixeli99/SVD_Xtend/blob/main/train_svd.py. Because I find that the inference code of MimicMotion is quite similar with SVD_Xtend.

However, I try to train the model and got the following blurry results. I used the 8bitOpt and turn on the gradient checkpointing, set the num_frames to 12. Welcome the discussion if you are also trying to implement the training code or already finished it. Thank you so much.

image

RecordK commented 4 months ago

Would it be possible for you to kindly share the code you have implemented?

Bowen-Jian commented 4 months ago

Hello, I am also trying to modify this project to train the MimicMotion model. If possible, could you upload your training code to GitHub so that we can discuss and debug it together?

Bowen-Jian commented 4 months ago

Another project you can refer to is https://github.com/MooreThreads/Moore-AnimateAnyone. One of the modifications is replacing the network architecture with Unet.

zyh-uaiaaaa commented 4 months ago

I got a similar problem, but if I use the [train_svd.py] to only train SVD, it's all fine. The results got bad after adding the posenet, any solutions?

wtjiang98 commented 4 months ago

In fact, a simple way to validate is to continue training on their pre-trained model to verify the correctness of your code logic. That's what I did. Then I retrained this model, and after a few epochs, I almost reproduced the model with slightly less clarity. I encountered and resolved many difficulties during the reproduction process. Therefore, if you share your code, I can point out where the problems might be.

Thank you very much for your kindness. But the code is not allowed to shared due to my affiliation. An observation is that it is quite good when I training using one GPU, but produce blue blurry results on more than one GPU. I am still trying to solve it.

Jie-zju commented 3 months ago

In fact, a simple way to validate is to continue training on their pre-trained model to verify the correctness of your code logic. That's what I did. Then I retrained this model, and after a few epochs, I almost reproduced the model with slightly less clarity. I encountered and resolved many difficulties during the reproduction process. Therefore, if you share your code, I can point out where the problems might be.

Thank you very much for your kindness. But the code is not allowed to shared due to my affiliation. An observation is that it is quite good when I training using one GPU, but produce blue blurry results on more than one GPU. I am still trying to solve it.

Try to load a pretrained model from mimicmotion to go on training and inference only on mimicmotion may be good.

adf1178 commented 3 months ago

Hello! We met similar problems that generated videos become blue blurs. Could you please tell me what version of VAE are you using?

wtjiang98 commented 3 months ago

Hello! We met similar problems that generated videos become blue blurs. Could you please tell me what version of VAE are you using?

I do not change the model structure of model. I use the VAE from Stable Video Diffusion

wtjiang98 commented 3 months ago

In fact, a simple way to validate is to continue training on their pre-trained model to verify the correctness of your code logic. That's what I did. Then I retrained this model, and after a few epochs, I almost reproduced the model with slightly less clarity. I encountered and resolved many difficulties during the reproduction process. Therefore, if you share your code, I can point out where the problems might be.

Thank you very much for your kindness. But the code is not allowed to shared due to my affiliation. An observation is that it is quite good when I training using one GPU, but produce blue blurry results on more than one GPU. I am still trying to solve it.

Hey! I solve this problem by using the correct way of accelerate.unwarp(). The blue burry results are produced since I use accelerate.unwarp(mimicmotion_model). It seems that the unwarp function do not applied on the submodules. Thus, using accelerate.unwarp(mimicmotion_model.unet); accelerate.unwarp(mimicmotion_model.posenet) works for me.

ShadowLau commented 1 month ago

In fact, a simple way to validate is to continue training on their pre-trained model to verify the correctness of your code logic. That's what I did. Then I retrained this model, and after a few epochs, I almost reproduced the model with slightly less clarity. I encountered and resolved many difficulties during the reproduction process. Therefore, if you share your code, I can point out where the problems might be.

Thank you very much for your kindness. But the code is not allowed to shared due to my affiliation. An observation is that it is quite good when I training using one GPU, but produce blue blurry results on more than one GPU. I am still trying to solve it.

Hey! I solve this problem by using the correct way of accelerate.unwarp(). The blue burry results are produced since I use accelerate.unwarp(mimicmotion_model). It seems that the unwarp function do not applied on the submodules. Thus, using accelerate.unwarp(mimicmotion_model.unet); accelerate.unwarp(mimicmotion_model.posenet) works for me.

Hi @wtjiang98, may I ask how many steps you used to train the model? I have tried to reproduce the model, but the generation result is still not good after I trained 100000 steps. (If I use the provided mimicmotion weight to initialize the model, the generation result is good)