facebookresearch / DeepSDF

Learning Continuous Signed Distance Functions for Shape Representation
MIT License
1.41k stars 259 forks source link

Bug: All Latent Vectors Optimized After Each Batch #51

Open edgar-tr opened 4 years ago

edgar-tr commented 4 years ago

Hi Tanner,

the auto-decoder implementation has a bug that optimizes all latent codes after every single batch, even though only the batch's latent codes should be optimized. This happens because Adam simply uses the gradients that are stored at each latent vector, regardless of whether those gradients were generated from the current batch or some previous batch. Since the gradients are never cleaned, Adam will reuse them after each batch, thus modifying all latent codes every time. (Tensorflow without eager mode would not have this problem.)

This can be verified by printing e.g. the 0-th latent code after every single batch. It changes its values after each batch even though it only occurs in some of the batches.

I have not confirmed this with a clean version of the official code, but I very much suspect it holds true.

I have unfortunately not come up with a good solution for this. Instead, my code abuses the fact that Adam does not update parameters if their gradients are None, see https://pytorch.org/docs/stable/_modules/torch/optim/adam.html#Adam.step : I reset the gradients of the current batch's latent vectors to None after optimizer.step(). This keeps all latent vector gradients as None unless they are in the current batch. As far as I can tell from the Adam code, this should lead to the intended behavior of the momentum part of Adam. After the bugfix, DeepSDF's performance improved not a lot but still quite noticeably for me, both quantitatively and qualitatively.

I use an older version of the code though, I don't know whether my workaround is straightforward when using the embeddings data structure to store the latent codes. It might be possible to modify the parameter groups of Adam. I don't know how that interacts with self.state[p] in step() though, which stores the momentum state. (Maybe "p" is somehow broken as a dictionary key because it comes from the embeddings structure.)

I want to note that the gradients are not somehow magically set to 0 after each batch, neither in the current release nor in my code. Doing so would not lead to the intended behavior because of the momentum part of Adam.

Best regards, Edgar