A few basic question on implementation detail.

AliaksandrSiarohin / first-order-model

This repository contains the source code for the paper First Order Motion Model for Image Animation

https://aliaksandrsiarohin.github.io/first-order-model-website/

MIT License

14.32k stars 3.19k forks source link

A few basic question on implementation detail. #109

Open mazzzystar opened 4 years ago

mazzzystar commented 4 years ago

Thanks for your work ! Here are some questions:

About formula 4 and your implementation. Is this right for each kind of color of variabel match with the original formula ? In my undestanding, the references frameR means the origin point of K dim space in virtual coordinates, and z denotes "the space coordinate of D", but in your code it's a standard meshgrid, the identity_grid.
About HourGlass I don't see the HourGlass module in your paper, but in your code https://github.com/AliaksandrSiarohin/first-order-model/blob/902f83a4217c75c842e1f536b3331c5032b703a2/modules/dense_motion.py#L15

What does it stand for ? Cause we alrerady have a) the warp module b) the occlusion module.

I will appreciate for your answer ~

mazzzystar commented 3 years ago

@lastsongforu If use the keypoints as input, then we can't learn a keypoint dectcor, so we can not compute the first-order consistency loss when the image was slightly transformed. @AliaksandrSiarohin tell me if I'm wrong :-p

lastsongforu commented 3 years ago

@mazzzystar thank you for your quick reply. I think the first-order consistency loss is only used for training the kp_detector? so if we have the gt keypoints as input, we just neet to train a conv2d to get the jacobian. do you mean the gt keypoints will disable the constrain for jacobian?

mazzzystar commented 3 years ago

@lastsongforu yes.

FrankXinqi commented 3 years ago

@mazzzystar Good questions found in your discussion.

Currently, I also want to learn GT keypoints for my face keypoints, since the self-supervised one is not desired. Could you please explain more about your L1 loss configuration? Is that just |GT-Pred|?

mazzzystar commented 3 years ago

@FrankXinqi right.

wjtan99 commented 2 years ago

Thanks for these great discussions. @FrankXinqi Do you have progress to learn a keypoint detector from GT keypoints? @mazzzystar Will you open-source your code?

mazzzystar commented 2 years ago

@wjtan99 the code is in https://github.com/AliaksandrSiarohin/first-order-model/issues/109#issuecomment-651023009 and https://github.com/AliaksandrSiarohin/first-order-model/issues/109#issuecomment-651100628

cnnAndBn commented 1 year ago

No this is not the purpose. The purpose is to make independent motion predictions for S, D. If the motion predictions is dependent on each other, e.g. Keypoint predictor use a concatenation of S and D, it won't generalise.

hi: now I am still a little confused about it, you say the keypoint detector make independent motion predictions for S, D, but I think the motion should be at least two frames, if a function (here the KPDetector class) only takes one frame , how can we estimate the motion? I think the KPDetector only detect key point, but what is the first order physical meaning here? the key point local apperance variation? @AliaksandrSiarohin @AliaksandrSiarohin