Weizhi-Zhong / IP_LAP

CVPR2023 talking face implementation for Identity-Preserving Talking Face Generation With Landmark and Appearance Priors
Apache License 2.0
637 stars 72 forks source link

large content landmarks #11

Closed KingStorm closed 1 year ago

KingStorm commented 1 year ago

Nice work!

trying to train IP_LAP with custom data, but got results: content landmark is generally larger than the pose landmark. => therefore there is a mismatch. However, if I use pretrained model , the size of resulting content landmark is correct.

training dataset: 5 min 480 x 640 video

截屏2023-06-02 21 23 37

Weizhi-Zhong commented 1 year ago

Hi, thanks for your interest. Does the eval_L1_loss decrease to 6e-3? as described in this issue. If not, what is the final eval_L1_loss of your training with custom data? Since your training dataset includes 5 min 480 x 640 video, I doubt whether it is enough.

Weizhi-Zhong commented 1 year ago

Hi, thanks for your interest. Does the eval_L1_loss decrease to 6e-3? as described in this issue. If not, what is the final eval_L1_loss of your training with custom data? Since your training dataset includes 5 min 480 x 640 video, I doubt whether it is enough.

Or, Is your traning overfitting? Compare the running loss and eval loss.

KingStorm commented 1 year ago

Hi, thanks for your interest. Does the eval_L1_loss decrease to 6e-3? as described in this issue. If not, what is the final eval_L1_loss of your training with custom data? Since your training dataset includes 5 min 480 x 640 video, I doubt whether it is enough.

Thanks for your reply. The eval L1_loss does decrease to 1e-3 level. I would consider it is overfitting enough. And I test it on the training data.

KingStorm commented 1 year ago

I have drawn sketch during training of landmark, it is reasonable: {epoch}_{step}_pred_sketch

Howver in inference, the sketch turns out to be mismatched between Pose and Content landmarks: temp

Weizhi-Zhong commented 1 year ago

I have drawn sketch during training of landmark

Hi, thanks for your interest. Does it mean that you draw sketches during training on the training dataset, and draw sketches during inference on the testing dataset?

KingStorm commented 1 year ago

Hi, seems find out some kind of clue about the size mismatch.

Found the landmarks extracted from preprocess_video.py is just fitting the 128x128 image with no left space, Like 2358

However, the landmarks extracted from inference_single.py have some space left in the 128x128 image, Like: 0_0_pred_sketch_not_replace_Nl_5k

Weizhi-Zhong commented 1 year ago

Hi, thanks for your interest. As shown in the following code: https://github.com/Weizhi-Zhong/IP_LAP/blob/e5d8fdc1ab01a1426ac4c8cfec461ec5d024050d/preprocess/preprocess_video.py#LL251C19-L251C19 While preprocessing the LRS2 dataset, we plus 5 extra pixels to the marginal so that the normalized coordinate of most bottom landmarks is not always 1. Similarly, in the inference: https://github.com/Weizhi-Zhong/IP_LAP/blob/e5d8fdc1ab01a1426ac4c8cfec461ec5d024050d/inference_single.py#LL282C9-L282C9 we plus some(25) pixels, so the landmarks are within the cropping region. Depending on your dataset and input videos, you can change the number of pixels added to the marginal region such that all landmarks are within the cropping region.

Hope this can be helpful for you.

KingStorm commented 1 year ago

thanks, fair enough.