georgeretsi / smirk

Official Pytorch Implementation of SMIRK: 3D Facial Expressions through Analysis-by-Neural-Synthesis (CVPR 2024)
https://georgeretsi.github.io/smirk/
MIT License
118 stars 7 forks source link

Some questions about SMIRK #14

Open GrizzCMX opened 2 weeks ago

GrizzCMX commented 2 weeks ago

Hello George!      Thank you for your work and open-source contributions; it's indeed very enlightening.  I have a few simple questions about your work:       1) Besides the common encoding of 3DMM params such as shape, expression, global pose+jaw pose, and caw, an additional 'eyelip' parameter is used in smirk. I'm not sure if this corresponds to the eye_pose parameter in the generic FLAME model. Has the FLAME model used in your open-source code been modified as shown in Figures 1 and 2, or does this parameter serve an additional purpose? Can it be used for decoding with the generic FLAME model?       2) For the same input image (as shown in Figure 3), SMIRK's reconstruction shows a better eye closure effect compared to DECA and Emoca, as shown in Figures 4 and 5. Since SMIRK does not provide code for generating meshes, I used the decoded FLAME vertices from SMIRK combined with the generic FLAME faces to generate an OBJ file. Although SMIRK's reconstruction is closer to the input, a noticeable issue is the overlapping of eyelids and eyeballs, which is not an isolated case (as seen in the zoom in of image). What could be causing this problem, and is there a solution? image

                                                                         fig1: the 3DMM params in smirk

image

                                                                         fig2: the 3DMM params in deca and emoca

image fig3: the input image

image image

                            fig4: the smirk result

image image

                        fig5: the emoca result

filby89 commented 2 weeks ago

Hey @GrizzCMX, inline answers follow :)

  1. the eyelid parameters independently control the left and right eyelid closure. We got these from the metrical tracker of MICA: https://github.com/Zielon/metrical-tracker/tree/master/flame/blendshapes. Inside FLAME.py you can see how these extra two blendshapes are used in the model. These are essentially extra expression blendshapes because I think original space could not model independent eyelid closure.

  2. This seems indeed like an erroneous reconstruction - the process you mentioned for getting the .obj file seems right. This may be attributed to the fact that we have noticed that the eyelid_parameters do have some weird interplay with the other parameters. We are going to look into it and try to reproduce it. In the meantime maybe you can try some very practical stuff such as maybe reducing just a bit the Z coordinate of the eyeball and let us know how it goes.

Btw, our FLAME.py is very similar and practically the same with the FLAME.py from DECA and EMOCA (just renaming and splitting of some variables) with the addition of the eyelids taken from the metrical tracker above.

GrizzCMX commented 2 weeks ago

Thank you for your response. I also think this issue might be caused by the 'eyelip' parameter not being completely disentangled from other 3DMM parameters. While manually adjusting the position of the eyeballs might improve this phenomenon. However, I have conducted experiments on extensive datasets (such as the MEAD dataset) and found that this issue is not incidental, so manual adjustments are quite limited. I really wish your team consider researching improvements to make this work more applicable to a broader range of downstream tasks (Because its effect is indeed impressive!). Thank you again for your work and open-source contributions.