HumanAIGC / EMO

Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
7.44k stars 901 forks source link

Does anyone have a plan to reproduce EMO? #202

Open liuzhuang1024 opened 6 months ago

johndpope commented 6 months ago

progress thus far

downloaded - https://academictorrents.com/download/843b5adb0358124d388c4e9836654c246b988ff4.torrent

https://github.com/johndpope/Emote-hack

created some data loader EMODataset.py

HeadRotation.py - using mediapipe get_head_pose_velocities_at_frame image

SpeedEncoder do the tanh thing to categorise head motion to speed bucket

Wav2VecFeatureExtractor using wav2vec2 (maybe it needs to be downgraded?) extract_features_from_mp4 - get nearby frames

Backbone / attention layers - WIP https://github.com/johndpope/Emote-hack/blob/main/Net.py

UPDATE (the rest of the code is scrap heap at the moment.)

the previous paper by HumanAIGC AnimateAnyone was successfully implemented by MooreThreads (includes training code) upvote here for them to recreate EMO paper. https://github.com/MooreThreads/Moore-AnimateAnyone/issues/98

this code is mostly cherry picked from magic-animate (does not include training code) - (which is based off animatediff) https://github.com/magic-research/magic-animate/tree/main

fire17 commented 6 months ago

@liuzhuang1024 awesome was gonna ask the same

@johndpope good luck man! We're all eagerly patient to see what you come up with!

Has anyone found any other teams or people working in this? Hope to see people help each other and make this a reality. We're about to enter the golden age of open source 😁 All the best everyone!

liutaocode commented 6 months ago

@johndpope , great job. However, I have some suggestions after watching some videos from CelebV-HQ. You may manually filter them to ensure high resolution and remove bad cases, such as individuals wearing masks or significant background changes, etc. From my limited experience, training or fine-tuning Stable Diffusion can be quite unstable if the data contains too many bad cases.