Tencent / MimicMotion

High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
https://tencent.github.io/MimicMotion/
Other
1.85k stars 154 forks source link

About the training of 72-frame model #65

Open QifengDai opened 3 months ago

QifengDai commented 3 months ago

Thank you for your great work.

I'm trying to implement MimicMotion's training code, and I noticed that the inference num_frames for MimicMotion_1.pth and MimicMotion_1-1.pth went from 16 to 72.

In my case, my training settings are num_frames 16, resolution 576 x 1024, GPU memory usage is close to 70G, I have turned on gradient_checkpointing and 8bit_adam. But when I use 72 frames for inference, there are big inconsistencies in the video.

What tricks do you use to train 72 frames? Training such a large model requires a considerable amount of GPU memory. Welcome to discuss the details and training tricks of the 72-frame model.

Thank you!

iiinsight commented 2 months ago

mark