cvlab-kaist / GaussianTalker

Official implementation of “GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting” by Kyusun Cho, Joungbin Lee, Heeji Yoon, Yeobin Hong, Jaehoon Ko, Sangjun Ahn and Seungryong Kim
Other
264 stars 32 forks source link

question in data processing #7

Closed beria-moon closed 4 months ago

beria-moon commented 5 months ago

the extracted gt img, parsing img, and torso img seem not look well.

image
beria-moon commented 5 months ago

This is a case in HDTF dataset, I just run the process.py and did not make any modification.

kyusuncho commented 5 months ago

Thank you for your interest in our research :)

The face parsing network utilized in our pipeline originates from the AD-NeRF framework, as we aimed for a fair comparison with AD-NeRF, RAD-NeRF, and ER-NeRF. While we acknowledge that the performance of the face parsing network may not be optimal, we recommend manually cropping the images to focus on the facial regions within a square area for improved quality. The face parsing should improve when the face occupies a larger portion of the whole video, although it may not be as precise as state-of-the-art works such as SAM.

If you're conducting experiments on the HDTF dataset, you can access facial cropping code from the GitHub repository at https://github.com/MRzzm/HDTF/tree/main. We employed this cropping technique for our experiments with the HDTF dataset, which may prove beneficial for addressing the issues observed.

Should you require further assistance or have additional questions, please don't hesitate to reach out.

beria-moon commented 5 months ago

Thakns for your kindly reply. I will try as your suggestion.

894269281 @.***

 

------------------ 原始邮件 ------------------ 发件人: "KU-CVLAB/GaussianTalker" @.>; 发送时间: 2024年4月30日(星期二) 下午3:57 @.>; @.**@.>; 主题: Re: [KU-CVLAB/GaussianTalker] question in data processing (Issue #7)

Thank you for your interest in our research :)

The face parsing network utilized in our pipeline originates from the AD-NeRF framework, as we aimed for a fair comparison with AD-NeRF, RAD-NeRF, and ER-NeRF. While we acknowledge that the performance of the face parsing network may not be optimal, we recommend manually cropping the images to focus on the facial regions within a square area for improved quality. The face parsing should improve when the face occupies a larger portion of the whole video, although it may not be as precise as state-of-the-art works such as SAM.

If you're conducting experiments on the HDTF dataset, you can access facial cropping code from the GitHub repository at https://github.com/MRzzm/HDTF/tree/main. We employed this cropping technique for our experiments with the HDTF dataset, which may prove beneficial for addressing the issues observed.

Should you require further assistance or have additional questions, please don't hesitate to reach out.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>