AaronJackson / vrn

:man: Code for "Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression"
http://aaronsplace.co.uk/papers/jackson2017recon/
MIT License
4.52k stars 746 forks source link

3D outer interocular distance calculation #88

Closed Jim61C closed 6 years ago

Jim61C commented 6 years ago

Hi!

Firstly, thank you for making the code available! I am trying to benchmark the method using the metric mentioned in the paper named NME. However, I am wondering how do you get the outer interocular distance on the mesh in 3D euclidean space? Do you use the fixed topology index in 3DMM and if so, would you mind sharing which index of the mesh correspond to the eye region? Thank you!

AaronJackson commented 6 years ago

In the case of the 3DDFA datasets, the 3DMM mesh was indeed used. The vertex for normalisation was selected by finding the nearest neighbour to the landmark points (37 and 46) on many frontal images prior to computing any metrics (as in, this remained fixed from this point on).

On Florence and BU4DFE it is clearly more difficult because the meshes do not have fixed correspondence. If I remember correctly, I ran a landmark detect on the frontal texture images and transferred these to the 3D space using the texture coordinates for the 3D object. So in this case I just used the landmarks mentioned above.

Jim61C commented 6 years ago

Hi!

Thank you for your clarification! Yes! That's what I am doing for 3DDFA. I see. yes, Florence and BU4DFE does not have fixed topology. So would you mind sharing how do you handle the profile view in Florence and BU4DFE then? As the ray casting of 2D landmark to 3D is ambiguous, especially when half of the face is occluded. Thanks!

AaronJackson commented 6 years ago

The 2D landmarks are not projected onto the 3D rendering. They are looked up via the texturing information. It's a bit annoying to implement but it works very well. Each face in the mesh has a texture coordinate index, and each texture coordinate has xy values between 0-1. So, you scale the texture coordinate back to the original texture size, and find each facial point in the texture, and work back from there to the mesh's face and then to one of the surrounding vertices. Use the xyz coordinate of the vertex you have chosen as your new facial point. The XY projection will be the same, but the Z component will be correct even for the self occluded landmarks.

Jim61C commented 6 years ago

I see. Got it. Thank you!! I will look into the texture image. Just to clarify, the texture image that you are talking about is the conformal mapping from facial mesh to the 2D plane right? Something like this

image

for the 3D mesh of

image

AaronJackson commented 6 years ago

No, otherwise you would not be able to run a landmark detector on them. They are just high resolution photographs which have been mapped to the 3D object. An example from BU-4DFE below. I just had a look at the Florence images and I may have done it slightly differently because they do include frontal textures. If I remember or find the time to look up what I did, I'll get back to you. 041

Jim61C commented 6 years ago

I see! Sure! Thank you for helping in digging this up! I will look into more details of the other two dataset and see if I can get the 3D position of the two corners using the method described. Thanks!

Jim61C commented 6 years ago

Hi!

I am just wondering would you mind providing the 3DMM vertex index (for 300W-LP dataset) corresponding to the landmark points (37 and 46) on many frontal images as you have previously mentioned? Thank you! As I wish to benchmark the results that you had and I believe it would be the best to use the same set of indexes for the outer interocular distance.

Thanks!

AaronJackson commented 6 years ago

I have the following in my code:

    d = sqrt(sum((G(2853,:) - G(13673,:)).^2,2));

This was applied BEFORE the vertex filter. I haven't checked them so obviously please confirm visually that this is correct. Otherwise, I might be looking at the wrong code.

Jim61C commented 6 years ago

Hi!

Thank you very much! Yes, that was before the vertex filter and it works! Just that for other people who are also interested in it, when using the python code base, the index needs to be minused by 1 since matlab uses 1-based indexing.

Thanks!

Jim61C commented 6 years ago

Hi!

Once again, Sorry for all the questions but I would like to check with you on one more that during the evaluation, since the iso-surface from the volume predicted will always be an enclosed surface. Do you remove the back part of the face during the ICP registration? or you still keep it as I am encountering the fact that sometime, ICP couldn't register properly when the back part of the face is close to the ground truth mesh.

Thank you!

AaronJackson commented 6 years ago

The filter is only applied to the groundtruth mesh. ICP is used between this filtered groundtruth and the isosurface of the volume. The two meshes should be very well aligned already, before using ICP. ICP is only used to fine correspondence between vertices and should not actually modify the mesh.

Are you using the ICP functions in Python? If you have MATLAB, try pcrigidreg, it has worked pretty much flawlessly on every face I have given it.

Jim61C commented 6 years ago

Hi!

Yes! I am using the open3D python binding for icp. Yes, I figured out a way to align them better. Also, thank you for the heads up on the Matlab function as well! In addition, may I know that how do you get the 3D NME for 3DDFA on the AFLW2000-3D dataset? since isn't AFLW2000-3D's ground truth is generated by the 3DDFA fitting itself? Thanks!

HOMGH commented 3 years ago

Hi!

Yes! I am using the open3D python binding for icp. Yes, I figured out a way to align them better. Also, thank you for the heads up on the Matlab function as well! In addition, may I know that how do you get the 3D NME for 3DDFA on the AFLW2000-3D dataset? since isn't AFLW2000-3D's ground truth is generated by the 3DDFA fitting itself? Thanks!

Hi @Jim61C , I had a question regarding computing NME. When the output 3D mesh of an algorithm has for example 43k vertices (for the case of PRNet, how can we compare this with the groundtruth mesh which has for example 53k vertices(for the case of AFLW2000 dataset). Would you please share the NME computing code you used based on above discussion? Thank you.