Closed sherjy closed 3 years ago
Hi,
When I was developing IM-NET, the skip-connections were added for one reason -- convergence. In the original implementation, the output layer employs a sigmoid activation function, which is famous for its gradient vanishing problems. Without the skip connections, the original IM-NET won't converge.
However, later I found other activation functions, namely clip h(x)=max(min(x, 1), 0) or leaky clip h(x)=max(min(x, 0.01x+0.99), 0.01x) that do not have the gradient problems, thus I removed skip-connections to reduce the training time. The performance change is as follows (both using leaky clip as the activation function for the output layer):
model name | CD (x1000) | LFD | training time on one Tesla V100
IM-NET original | 0.514 | 2029.87 | 62.9
no skip-connections | 0.519 | 2050.40 | 47.8
Thank you for your prompt reply!
Hiiii Zhiqin, Thanks for your great work and codes.
I found that you removed all skip connections of the decoder in IM-NET Improved PyTorch implementation, which is also mentioned in your paper section 3.2 to illustrate: "They can be removed when the feature vector is long, so as to prevent the model from becoming too large." I wonder if it's necessary and how's the performance change since I consider taking the skip-connected structure is an important part of IM-NET.
Looking forward to your reply. Thanks in advance!