X-LANCE / AniTalker

[ACM MM 2024] This is the official code for "AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding"
https://x-lance.github.io/AniTalker/
Apache License 2.0
1.36k stars 125 forks source link

Cropping issue discussion #40

Open nitinmukesh opened 3 weeks ago

nitinmukesh commented 3 weeks ago

Arvrairobo commented 3 days ago

@nitinmukesh thank you for your PR, but it is not cropping the image, i have applied your PR and it is running successfully but it is not cropping the image and due to that, if there is a full image (not just face) then anitalker is not able to generate talking face. could you please recheck cropping code? so it should automatically crop the image and use that cropped image as an input correct? but it is not working like that

for following issue ERROR: No matching distribution found for pytorch-lightning==1.6.5

python -m pip install pip==24.0

Make sure to install following https://github.com/X-LANCE/AniTalker/blob/main/md_docs/run_on_windows.md

nitinmukesh commented 3 weeks ago

Test result with auto-crop

Images used

1

2

3

5

Output

https://github.com/user-attachments/assets/2af3f557-b254-4d71-845d-9d08930443a6

https://github.com/user-attachments/assets/72395594-de4e-4aa9-b833-82346d409593

https://github.com/user-attachments/assets/08ec6bcf-7f41-4f4e-996d-80170bedc73a

https://github.com/user-attachments/assets/78ab838f-147c-4452-a62d-ae5c84f3238e

Arvrairobo commented 3 weeks ago

@nitinmukesh thank you for the help, yes i figured out why it was not working so i missed two files

one was that .dat file and that .npy file, once i placed that in data_preprocess folder, it started to work thank you very much for your help

Arvrairobo commented 3 weeks ago

@nitinmukesh @newgenai79 can we achieve cropping with 512 X 512 resolution instead of 256X 256 ?

also i can see the blink and eyes movement are not very natural compares to echomimic, how we can achieve the same? i tried huber_pose as well but not getting perfect result.

what is hubert_full_control?

nitinmukesh commented 3 weeks ago

@nitinmukesh @newgenai79 can we achieve cropping with 512 X 512 resolution instead of 256X 256 ?

I don't think it's possible since the model is trained on 256 x 256 resolution.

also i can see the blink and eyes movement are not very natural compares to echomimic, how we can achieve the same?

EchoMimic is good as it is trained on large dataset, hence the results are better.

Arvrairobo commented 3 weeks ago

@nitinmukesh echomimic is very slow, for 40 secs video it took about 2.5 hours so not a practical solution. anitalker is very fast, thats why i am building a solution on top of it.

if author @liutaocode or any author can shed some lights on it or give some headstart on how to achieve blink and facial expression, that would be great, or if they release a code or more trained model that would also work, looking forward to it

Arvrairobo commented 1 week ago

still waiting for blink feature, also any update on the project? any new release of the code? or any new features?