facebookresearch / DeepSDF

Learning Continuous Signed Distance Functions for Shape Representation
MIT License
1.4k stars 258 forks source link

error during reconstruction using small batch sizes #31

Open HM102 opened 5 years ago

HM102 commented 5 years ago

When I train with small batch sizes (ex.3,4), I get error during reconstruction, since the network could not predict negative SDF values. Although during training everything looks fine.

If I replace the decoder.eval() with decoder.train(), I get normal reconstructions. So I guess the problem is with the dropout scaling difference or the weight normalization in training and testing . @tschmidt23 your feedback is much appreciated!

tschmidt23 commented 4 years ago

Weight normalization should be the same in train and eval. It sounds like the network had failed to learn anything -- like you say the reconstruction will fail if the SDF values are all positive or all negative. How long had the network trained before you tried to do reconstruction? Did the training loss decrease? One potential problem that could lead to this issue is poor initialization of the network combined with SDF clamping. If the network predicts values outside of the clamping distance there will be no gradient, so if the initialized network predicts all values outside of this range it will not learn anything.

wen-yuan-zhang commented 3 years ago

I met the same problem while tring single object reconstruction. The network trained 2000 epoch, during which loss has decreased from 0.042 to 0.0063. While reconstructing, I noticed that all the initialized latent code would output values > 0.1 through well-trained decoder, so there is no gradient for the latent code to learn(only regularization grad will be learned). In other words, the gradient is all 0. I don't know how to change the method of initialization, so I close the clamp module(comment line 58 and 74 in reconstruct.py). After this, the latent code can be correctly learned, and the mesh can be correctly reconstructed.