Hanbo-Cheng / DAWN-pytorch

Offical implement of Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for talking head Video Generation
149 stars 9 forks source link

Results after post processing - 128 x 128 model #9

Closed nitinmukesh closed 2 days ago

nitinmukesh commented 2 days ago

https://github.com/user-attachments/assets/aac61c0b-ce64-4143-836b-4483d40b1219

Eyes blink have a little problem. Does not look same like the project page. What am I doing wrong

CleberPeter commented 2 days ago

Hi @nitinmukesh, how did you reach this quality ? Did you used the 128x128 model ? It seams that your result have a very good quality even in the 512x512 format.

Below is my result, am i missing some fine tunning ? Or, are you using another tool to perform the post processing and scaling-up the generated images ?

https://github.com/user-attachments/assets/d7d4be05-73a4-4cbe-943a-72d18a33d779

Thanks in advance.

Hanbo-Cheng commented 2 days ago

@nitinmukesh Your postprocessing is so impressive! Could you share it with me? Thank you so much!

The project page is generated by 256256, and I found 256256 setting is better for driving the out of dataset (hdtf dataset) samples.

Plus, I generate pose and blink separately on the demo page (a model generates the pose and another generates blinks), but I think it's not as convenient as generating them together. The quantitative results show little difference between these two approaches. Anyway, I will upload the script of separate generation soon.

nitinmukesh commented 2 days ago

@CleberPeter @Hanbo-Cheng Here is the guide for post-processing https://www.youtube.com/watch?v=iVy2bXPQNKY&t=575s