Why not using some existing pretrained landmark detection networks？

shiluooulihs commented 4 years ago

I have a question, maybe we can replace the KPDetector with some pretrained face landmark detection networks directly for VoxCeleb Dataset。

Have you make similar experiments. Maybe you have some other considerations？

AliaksandrSiarohin commented 4 years ago

Because for me this is not interesting research direction.

shiluooulihs commented 4 years ago

Maybe you misunderstood what I meant. I'm not talking about doing some researchs related to face landmark detection. There are some exsiting landmark detection networks, face_alignment or dlib.

So why not using the keypoints detected from these nets directly. You have trained a new network: KPDetector. Do you have some considerations?

And, when I retraining the first-order-model, I found the keypoints detected from KPDetector are not at the positions of semantic meaning. But it works well for image animation !

In the paper：Few-shot Video-to-Video Synthesis（also in NIPS 2019 ). It maybe borrowed some idea from your last pater: MonkeyNet, but it used pretrained face landmark detection directly.

AliaksandrSiarohin commented 4 years ago

I understand it correctly, please check MonkeyNet paper for motivation. The main goal of this research direction is to learn from raw set of videos. (e.g in unsupervised manner). Keypoints is only available for faces or human bodies. While the main novelty of this method and monkey-net with respect to others is the ability to train on raw videos without any domain knowledge.

shiluooulihs commented 4 years ago

Thank you very much for your reply！

AliaksandrSiarohin / first-order-model

Why not using some existing pretrained landmark detection networks？ #88