Question of removing all skip connections

sherjy commented 3 years ago

Hiiii Zhiqin, Thanks for your great work and codes.

I found that you removed all skip connections of the decoder in IM-NET Improved PyTorch implementation, which is also mentioned in your paper section 3.2 to illustrate: "They can be removed when the feature vector is long, so as to prevent the model from becoming too large." I wonder if it's necessary and how's the performance change since I consider taking the skip-connected structure is an important part of IM-NET.

Looking forward to your reply. Thanks in advance!

czq142857 commented 3 years ago

Hi,

When I was developing IM-NET, the skip-connections were added for one reason -- convergence. In the original implementation, the output layer employs a sigmoid activation function, which is famous for its gradient vanishing problems. Without the skip connections, the original IM-NET won't converge.

However, later I found other activation functions, namely clip h(x)=max(min(x, 1), 0) or leaky clip h(x)=max(min(x, 0.01x+0.99), 0.01x) that do not have the gradient problems, thus I removed skip-connections to reduce the training time. The performance change is as follows (both using leaky clip as the activation function for the output layer):

model name            |  CD (x1000) |    LFD     |  training time on one Tesla V100
IM-NET original       |    0.514    |  2029.87   |  62.9
no skip-connections   |    0.519    |  2050.40   |  47.8

sherjy commented 3 years ago

Thank you for your prompt reply!

czq142857 / IM-NET-pytorch

Question of removing all skip connections #5