AaronJackson / vrn

:man: Code for "Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression"
http://aaronsplace.co.uk/papers/jackson2017recon/
MIT License
4.52k stars 746 forks source link

How to get smooth result? #96

Closed likewind1993 closed 5 years ago

likewind1993 commented 5 years ago

Thank you for your work, but I still have two questions... 1.I constructed a network(vrn) in the paper. To verify the effect of it, I run it with one sample for 10000 epoch, and I get the result, 1 2 3 I output the sigmoid value directly, but the surface are not smooth as the paper, I wanna know if you did any additional processing to get the smooth result. 2.In the voxelisation process, why do you rotated the mesh to frontal and realigned again? In other words, why don't you voxel mesh directly. As I know, in the realign process of the first method, it would cause round numbers problem, because the realign coordinates is not always integers.

Looking for your reply!

AaronJackson commented 5 years ago

Hey, nice job. Great to see people reimplementing the work. What surface extraction function are you using? The ones in Python are bad. Try the one in MATLAB?

Your spatial alignment looks off very slightly. This will hurt performance significantly. Not sure whether you introduced this during the voxelisation process or whether this was introduced during your plotting.

We rotated the mesh to frontal during "voxelisation" because our meshes were not water tight. They could have been made water tight, maybe that's what you did.

likewind1993 commented 5 years ago

@AaronJackson Thank you for your reply!

The surface extraction function I used is the one in MATLAB.

Because I use a rough method to rescale the model(get scale factor(F), rotate matrix(R), transponse matrix(T)), and realigned it with rescaled imgs(192x192), maybe it is not suited for all samples and cause error.

I output the sigmoid value directly, maybe it's because only one sample leads to overfitting, a smooth surface would be got when apply more samples.

But I still don't understand the "threshold" you used in training, it's used for calc accuracy? or for calc loss? Because predicted V_hat's value cannot be 0 or 1, which would cause loss be NAN, so I am confusing about it.

AaronJackson commented 5 years ago

Yes, I was going to suggest it might have overfit since you only used one image.

The BCE loss function (at least in torch) allows the output to be between 0 and 1, but the target must be binary. https://github.com/torch/nn/blob/master/doc/criterion.md#nn.BCECriterion

AaronJackson commented 5 years ago

BTW, you can also train with L2 loss. Last time I tried it, it worked ok. And this would allow you to learn the softer values more easily.

likewind1993 commented 5 years ago

OK, I will try your advises, Thank you again for your help!

AaronJackson commented 5 years ago

Okay, please keep me updated. You are the closest I have seen yet to reimplementing the training. Maybe someone else has done it but not let me know :)

likewind1993 commented 5 years ago

@AaronJackson Hi, I meet a serious problem. Could you tell me how many samples did you use in training...All of meshs generated by 3DFFA? I find it too time-consuming. Even though I only I used 600 samples from it, it only run 150 epoches a day. It may have something to do with my use of pytorch.

But I still wanna know, how many samples did you used and how many epoches did you run.

AaronJackson commented 5 years ago

As stated in the paper we trained with approximately 60,000 facial models. Perhaps 6,000 were unique images, but the faces had been warped with the 3DDFA code, which allows our method to work on large poses.

150 epochs is not required. We trained for maybe 50-60 epochs, but on a large amount of data. Regressing 200 depth wise slices is not going to be particularly fast, especially when you have to do backprop as well.

Perhaps the slowest part for you will actually be to generate the voxelised data?

https://arxiv.org/pdf/1703.07834.pdf

likewind1993 commented 5 years ago

@AaronJackson Maybe I missed something, voxel data is the fastest part of mine. Thank you again! I hope it works this time...Reimplementing a paper is so difficult...=.=..