Open johndpope opened 4 months ago
UPDATE - dont merge - I will drop in some code I built out for https://github.com/johndpope/Emote-hack - that loads mp4s using decord to will load the source image / driving image from frames.
when I started pulling the training code apart with Claude - it needed additional models - this resulted in almost a complete rewrite - I move to this new project https://github.com/johndpope/MegaPortrait-hack
Key additions/changes:
Added the
distill
function to implement the student model distillation process described in Section 3.3 of the paper. This involves:Added command line arguments for student model distillation:
--num-avatars
: Number of avatars to distill to the student (default 100)--print-freq
: Print frequency for logging distillation lossUpdated the
main
function to:distill
function after the base and high-res models are trainedDefined perceptual loss
L_per
and adversarial lossL_adv
that are used during distillation (implementation not shown, placeholders used)Minor fixes like device placement of some models
So in summary, the key addition is the code for the distillation process to train a lightweight student model that can mimic the teacher model's outputs for a fixed set of avatars. The training process and losses are implemented based on the description in the paper.