Placement of voxelisation of ground truth shape in z

AaronJackson / vrn

:man: Code for "Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression"

http://aaronsplace.co.uk/papers/jackson2017recon/

MIT License

4.52k stars 746 forks source link

Placement of voxelisation of ground truth shape in z #89

Closed Jim61C closed 6 years ago

Jim61C commented 6 years ago

Hi!

Thank you for making the feed forwarding code available! I just wish to see how the voxelisation result is placed in the cube. I understand that x-y of the voxel volume will align with the image 192x192, however, as orthographic camera model is used. What is the convention you used to put the volume in z coordinate? are you putting them such that the volume will always be centered in z. Also, I am wondering may I know how much of the model are you cutting at the back? is there a particular threshold used, Thank you very much!

AaronJackson commented 6 years ago

The positioning has to be consistent since if it moves about in the Z component it will just seem like noise during training. If I remember correctly on this particular model, the Z placement is done by the mean of the original mesh. I've also done this based on the point closest to the camera (usually the tip of the nose in frontal images, for example). I'm not sure which works best, and I am not completely sure (off the top of my head) which way I did it for this particular model.

The output of the network is not thresheld, at least in the MATLAB version, where I just let MATLAB decide. If you do threshold it, the output will be more blocky (with most marching cube like methods) since it cannot smooth the isosurface based on the voxel intensity.

If you are computing error for a publication, I would appreciate it if the comparison was done against the MATLAB implementation. The Python implementation uses quite a bad isosurface function.

Jim61C commented 6 years ago

Thank you very much for the clarification! I see! For the thresholding, yes, no thresholding is done on the iso-surface calculation to avoid the blocky effect as mentioned by you exactly. What I meant is for the voxelisation process, as I believe the current model cuts off the 'ear' part, so I am wondering how the 3DMM meshes are being cut to generate the voxelisation result for training. Thank you!

AaronJackson commented 6 years ago

Ah, I see. We manually selected the frontal region of a frontal face, in advance, and then used those saved indices for each face we voxelised.

Jim61C commented 6 years ago

Cool! Thanks! I am wondering would you mind providing those indices? or some guideline on how to get those, I am seeing the face contour indexes in the 3DDFA code base but I am not sure how to derive the frontal face indexes out of that contour. Thanks!

Another question is that it seems that the ground truth voxel volume used for supervision is actually the mesh with z component scaled by 2, which results in a 'warped' shape of the face during learning, I am wondering is there a particular reason for doing this instead of learning the unscaled shape? Thank you!

AaronJackson commented 6 years ago

Hey, the vertices can be found here: http://cs.nott.ac.uk/~psxasj/download.php?file=vrn-3ddfa-vertexfilter

The original purpose of scaling the Z component by to try and improve detail. I'm not actually sure how much it helped as we never did any quantitative analysis of this. The code provided scales the Z component by 0.5 if I remember correctly.

Jim61C commented 6 years ago

Great thanks for the vertex indexes!

I see. Yep, during the iso-surface calculation, it scales the raw volume back by multiply 0.5, which means the original raw is scaled by 2. I was wondering if there is a principal / thumb of rule behind the reason of this.

Thanks.