AaronJackson / vrn

:man: Code for "Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression"
http://aaronsplace.co.uk/papers/jackson2017recon/
MIT License
4.52k stars 746 forks source link

About Hole Filling #108

Closed AXIHIXA closed 5 years ago

AXIHIXA commented 5 years ago

Hi Aaron,

Thank you for this brilliant repo!

I'm trying to reproduce the unguided VRN and I'm curious about the hole filling algorithm you used during the voxelization process. The meshes in 300W_LP dataset are not closed, not only on the mouth, but also at the back of the head.

I notice that in other issues, you mention that voxels both on and inside the surface should be set to 1. The hole on the mouth can be filled easily using something like Delaunay, but I have no idea on the bigger one at the back. Unfortunately, without a mesh that is completely watertight at the back, there's no way to say "enclosed by 3D scan" or "in the background".

Especially, the output of the released VRN model is flat at the back of the head, while the edges on bigger border in 300W_LP meshed are not on the same plane. How do you achieve this on training data? When filling holes in training data, is there a threshold on z coordinate and you just crop the part behind, then set this plane the surface of the mesh at the back?

I'm a undergraduate student and the issue may seem kind of stupid. Please don't mind and close it if it bothers you.

Thank you, Xi

AaronJackson commented 5 years ago

Hi Xi, We rotated the face to a frontal pose using the pose parameters available in 300W_LP. From this we interpolated a depth map and backfilled to a fixed point. The 3D volume is then transformed back to its original pose and position.

Hopefully this answers all of your questions. Just ask if not.

AXIHIXA commented 5 years ago

Hi Aaron,

Thank you for your swift reply. It really helps me a lot.

Now I have another question, that the rotation matrix and transforamtion matrix contain decimals,

so when I rotate the volume back after voxelization, the value in a specific voxel is not 0/1 but also a decimal.

Do you just drop the decimal parts during training, or just keep them?

At least in visualization of voxelized faces, if I drop the decimal part then the result looks really poor.

Thank you,

Xi

At 2018-10-25 06:58:49, "Aaron S. Jackson" notifications@github.com wrote:

Hi Xi, We rotated the face to a frontal pose using the pose parameters available in 300W_LP. From this we interpolated a depth map and backfilled to a fixed point. The 3D volume is then transformed back to its original pose and position.

Hopefully this answers all of your questions. Just as if not.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

AaronJackson commented 5 years ago

During training we use binary cross entropy loss, so yes you can just round it to 0 or 1. During inference though, the network will regress a value which is close to 0 or 1, and you can use this to smooth the output, or potentially increase the amount of detail if your training data contains it. The surface extraction algorithm needs to support this and none of them in Python do this.

In summary, don't worry if it looks blocky during training, it probably won't matter once the network is trained.

Aaron

On 26 October 2018 at 15:11 BST, AXIHIXA wrote:

Hi Aaron,

Thank you for your swift reply. It really helps me a lot.

Now I have another question, that the rotation matrix and transforamtion matrix contain decimals,

so when I rotate the volume back after voxelization, the value in a specific voxel is not 0/1 but also a decimal.

Do you just drop the decimal parts during training, or just keep them?

At least in visualization of voxelized faces, if I drop the decimal part then the result looks really poor.

Thank you,

Xi

At 2018-10-25 06:58:49, "Aaron S. Jackson" notifications@github.com wrote:

Hi Xi, We rotated the face to a frontal pose using the pose parameters available in 300W_LP. From this we interpolated a depth map and backfilled to a fixed point. The 3D volume is then transformed back to its original pose and position.

Hopefully this answers all of your questions. Just as if not.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

-- Aaron Jackson, Research Associate Computer Vision Lab, University of Nottingham http://aaronsplace.co.uk

AXIHIXA commented 5 years ago

Hi Arron,

Thank you for your swift reply.

I plan to do the training in pytorch and I will tell you if I succeed in the reimplementation. And, there may be some other details I will have to ask during training.

Anyway, thank you for your reply.

Best,

Xi

At 2018-10-27 00:49:07, "Aaron S. Jackson" notifications@github.com wrote: During training we use binary cross entropy loss, so yes you can just round it to 0 or 1. During inference though, the network will regress a value which is close to 0 or 1, and you can use this to smooth the output, or potentially increase the amount of detail if your training data contains it. The surface extraction algorithm needs to support this and none of them in Python do this.

In summary, don't worry if it looks blocky during training, it probably won't matter once the network is trained.

Aaron

On 26 October 2018 at 15:11 BST, AXIHIXA wrote:

Hi Aaron,

Thank you for your swift reply. It really helps me a lot.

Now I have another question, that the rotation matrix and transforamtion matrix contain decimals,

so when I rotate the volume back after voxelization, the value in a specific voxel is not 0/1 but also a decimal.

Do you just drop the decimal parts during training, or just keep them?

At least in visualization of voxelized faces, if I drop the decimal part then the result looks really poor.

Thank you,

Xi

At 2018-10-25 06:58:49, "Aaron S. Jackson" notifications@github.com wrote:

Hi Xi, We rotated the face to a frontal pose using the pose parameters available in 300W_LP. From this we interpolated a depth map and backfilled to a fixed point. The 3D volume is then transformed back to its original pose and position.

Hopefully this answers all of your questions. Just as if not.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

-- Aaron Jackson, Research Associate Computer Vision Lab, University of Nottingham http://aaronsplace.co.uk

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

AXIHIXA commented 5 years ago

Hi Aaron,

Sorry for troubling you.

I write this email to confirm what you rounded during realigning process.

In 300W_LP's MATLAB script, the final result of vertices for rendering is:

ScalingParameter RotationMatrix Vertices + TranslationMatrix,

and I sent Vertices directly into voxelization function.

But, after voxelization, I only have a binary volume, not any vertex. While, the RotationMatrix in 300W_LP's MATLAB script is a 33 matrix used for vertices. What I tried is to extract the non-zero voxels in the volume and concatnate their indices (ranging [1, 192]) into a 3N vertor,

then set the ground truth = floor( RotationMatrix * IndicesVector ) + TranslationMatrix,

where the scale of TranslationMatrix is mapped from mesh coordinate to voxels.

BUT, this "ground truth" looks really terrible.

AND, what I actually rounded is index of a voxel into integer ranging [1, 192], not value of a voxel into bool 0/1 (like what you said in last email).

So, I wanna ask:

  1. What did you do on the realign process?

    Did you turn the rotation matrix into something that works directly on a volume, or just rounded the indices like what I did?

  2. Is it right to deal with the TranslationMatrix like what I did?

Thank you so much,

Xi

At 2018-10-27 00:49:07, "Aaron S. Jackson" notifications@github.com wrote: During training we use binary cross entropy loss, so yes you can just round it to 0 or 1. During inference though, the network will regress a value which is close to 0 or 1, and you can use this to smooth the output, or potentially increase the amount of detail if your training data contains it. The surface extraction algorithm needs to support this and none of them in Python do this.

In summary, don't worry if it looks blocky during training, it probably won't matter once the network is trained.

Aaron

On 26 October 2018 at 15:11 BST, AXIHIXA wrote:

Hi Aaron,

Thank you for your swift reply. It really helps me a lot.

Now I have another question, that the rotation matrix and transforamtion matrix contain decimals,

so when I rotate the volume back after voxelization, the value in a specific voxel is not 0/1 but also a decimal.

Do you just drop the decimal parts during training, or just keep them?

At least in visualization of voxelized faces, if I drop the decimal part then the result looks really poor.

Thank you,

Xi

At 2018-10-25 06:58:49, "Aaron S. Jackson" notifications@github.com wrote:

Hi Xi, We rotated the face to a frontal pose using the pose parameters available in 300W_LP. From this we interpolated a depth map and backfilled to a fixed point. The 3D volume is then transformed back to its original pose and position.

Hopefully this answers all of your questions. Just as if not.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

-- Aaron Jackson, Research Associate Computer Vision Lab, University of Nottingham http://aaronsplace.co.uk

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

AaronJackson commented 5 years ago

You won't get a set of vertices from the volume - that's the point of it. What you have is a 3D matrix containing inside or outside states. You can apply the inverse of your transformation directly to this volume, just as you would an image - but you may need to write a function to do this as most only work on 2D.

The general order for applying transformations is scale, rotation and translation. You can multiply all of these matrices together to produce a single transformation matrix. The inverse of the transformation will do just that, the inverse. Hence, V == V * T * inv(T).

You apply the transformation to the vertices, and the inverse to the volume. That way you preseve the scale, rotation and translation, while being able to voxelise a non-closed mesh over the Z axis.

AXIHIXA commented 5 years ago

Hi Aaron, I do wonder how you apply the inverse of transformation (actually also a 33 rotation matrix that works only on vertices) "directly to a 192 192 * 200 volume". Would you mind making a more detailed explanation on it?

According to your words, perhaps you write a functionm to convert the rotation matrix or something into another size that is multiplicable with the volume?

Sorry for not getting your point in the first line.

At 2018-11-01 03:09:38, "Aaron S. Jackson" notifications@github.com wrote:

You won't get a set of vertices from the volume - that's the point of it. What you have is a 3D matrix containing inside or outside states. You can apply the inverse of your transformation directly to this volume, just as you would an image - but you may need to write a function to do this as most only work on 2D.

The general order for applying transformations is scale, rotation and translation. You can multiply all of these matrices together to produce a single transformation matrix. The inverse of the transformation will do just that, the inverse. Hence, V == V T inv(T).

You apply the transformation to the vertices, and the inverse to the volume. That way you preseve the scale, rotation and translation, while being able to voxelise a non-closed mesh over the Z axis.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

AaronJackson commented 5 years ago

Look up how to apply a transformation matrix to an image and extend it to have a third dimension.

AaronJackson commented 5 years ago

Examples: https://uk.mathworks.com/help/images/matrix-representation-of-geometric-transformations.html