About the training of 72-frame model

Thank you for your great work.

I'm trying to implement MimicMotion's training code, and I noticed that the inference num_frames for MimicMotion_1.pth and MimicMotion_1-1.pth went from 16 to 72.

In my case, my training settings are num_frames 16, resolution 576 x 1024, GPU memory usage is close to 70G, I have turned on gradient_checkpointing and 8bit_adam. But when I use 72 frames for inference, there are big inconsistencies in the video.

What tricks do you use to train 72 frames? Training such a large model requires a considerable amount of GPU memory. Welcome to discuss the details and training tricks of the 72-frame model.

Thank you!

Tencent / MimicMotion

About the training of 72-frame model #65