YuanxunLu / LiveSpeechPortraits

Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation (SIGGRAPH Asia 2021)
MIT License
1.16k stars 200 forks source link

About trian model on own data #20

Closed zbdehh closed 2 years ago

zbdehh commented 2 years ago

谢谢大佬的开源代码. 关于采用自己的数据进行训练,有一些问题想询问一下。 目前我的理解是模型分为audio2feature.audio2headpose和feature2face,训练自己的模型的话需要重新训练audio2headpose和feature2face。 关于audio2headpose模型的数据集需要每一帧的2d_landmark,3d_landmark,trans, headpose,从论文sec4.1可知, 1.2d_landmark,来自开源工具检测的73点landmark 2,3d_landmark,tran,headpose来自重建3dface 问题1,dataset分为audiovisual_dataset,face_dataset,其中audiovisual_dataset用于audio2feature和audio2headpose的训练,face_dataset用于feature2face的训练? 问题2,3d_landmark存在于3d_fit_data.npz和tracked3D_normalized_pts_fix_contour.npy其有什么区别?3d normalized和fix contour是怎样做的。 问题3,3d_landmark中存在负数,是以图片中点为原点?这样的话change_paras.npz中的scale,xc,yc分布代表什么含义呢? 问题4,tracked2D_normalized_pts_fix_contour.npy 中的数据是直接由开源工具检测的的的吗?其值是大于1的,好像并没有做归一化,其值与3d_landmark的关系是一个存在于像素坐标系(2d),相机坐标系(3d)吗? 还是希望大佬可以出一个关于制作数据集的详细文档。

YuanxunLu commented 2 years ago

At first, all three models: Audio2Feature, Audio2Headpose, and Feature2Face should be re-trained for any new data. Other Questions: 1 2d landmarks: 73 points landmarks detector is not open-source, it is a closed-source tool developed by the company. Common used landmarks detectors like face_alignment (68 points) is also work but you need to change the landmarks' semantic settings. 2.1 Yes. 2.2 Actually I didn't do any normalized on tracked 3d points. Sry for that the name of the file 'tracked3D_normalized_pts_tracked3D_normalized_pts_fix_contour.npy' may mislead your way (if you checked the data carefully). The only difference is that I fix the face contour points in the latter file. Contour here means that I fix the contour indices for the reconstruction results. This is about another area about 3D face model tracking. In a word, I fixed the contour indices instead of using sliding contour indices found during the tracking. (If you are familiar with this area, you will understand it quickly). 2.3 3d landmarks are the object coordinates of the face model, i.e., facewarehouse in this project. There is no relationship to the image or something else, it is just the tracked face coordinates of the face model. As illustrated in the paper, I tracked the face on the original resolution. However, our desired results need to cut&crop&resize the image, therefore, 'scale, xc, yc' and other parameters denote the cut&crop&resize parameters. I need these parameters to transform the tracking results& camera parameters to fit the desired results (that is about resolution, face location, etc.) 2.4 'tracked2D_normalized_pts_fix_contour.npy' contains the transformed tracked 2d facial landmarks of the training data. Also sorry for the 'normalized' may mislead your way. Here, using 'transformed' is a better choice. It lies on the image coordinates (512x512).

About the training, please check issue #19. Here I don't illustrate it again. Hope for your understanding.