andyzeng / 3dmatch-toolbox

3DMatch - a 3D ConvNet-based local geometric descriptor for aligning 3D meshes and point clouds.
http://3dmatch.cs.princeton.edu/
BSD 2-Clause "Simplified" License
843 stars 188 forks source link

About converge rate #11

Closed YifeiAI closed 6 years ago

YifeiAI commented 6 years ago

Hi, Thanks for publishing the code!

I am trying your training procedure and trying to move your code to pytorch. But I find that the network doesn't converge after a long time. Maybe can you share your learning curve figure or could you simply describe after how many iterations could I expect it to converge.

Also I tried to add batch normalization layers after each conv layer and using ADAM instead of SGD. Do you think its helpful to accelerate the training process?

Thank you very much!

andyzeng commented 6 years ago

Hello! Batch normalization should certainly help accelerate the training process. However, I'm uncertain whether using ADAM will be better than SGD. If I recall correctly, getting the network to successfully converge also requires the learning rate to be quite large (around 0.001), and the weights should be initialized with Xavier initialization rather than Gaussian initialization. With these parameters, you should expect convergence after a few thousand iterations. For the implementation in this repository, the network converges when the loss oscillates between 0.04 ~ 0.07 (training time until convergence is typically one night).

Just as an extra FYI: although in this repository we have a pre-trained model trained for 137k iterations (the most mature model we had before the CVPR deadline ^^), I've been told by other people running the code that equivalent testing performance can be achieved with a converged model trained for only a few thousand iterations. So using models that have already converged but haven't been trained for as long as ours should still work fine.

YifeiAI commented 6 years ago

Hi! Thanks very much for the quick answer.

I looked at marvin.hpp about the implementation of Xavier initialization. The implementation is a little bit different from the original paper. In marvin it only divided by fan_in while in the original paper it divided by fan_in + fan_out. So is there any reasons to just use fan_in? Or it is just for the simplicity for the computation reason?

andyzeng commented 6 years ago

Marvin's implementation of Xavier initialization was inspired by Caffe's original implementation. A blog post here provides a possible explanation as to why there is no fan_out. I would venture to guess that it was just easier to implement at the time. But it seems to work well.

YifeiAI commented 6 years ago

Thanks a lot! Now the network is converged. I am using pytorch's implementation (with fan_out) for Xavier initialization. Seems working.

QinZiwen commented 6 years ago

@YifeiAI I have same problems, share you code by pytorch, please. Thanks, my email: qinziwen@emails.bjut.edu.cn