Slow training process same as issue #11

chrischoy / FCGF

Fully Convolutional Geometric Features: Fast and accurate 3D features for registration and correspondence.

MIT License

647 stars 112 forks source link

Slow training process same as issue #11 #28

Open ccyycc1994 opened 4 years ago

ccyycc1994 commented 4 years ago

@chrischoy @sjnarmstrong ,Thanks for your sharing. I tried your code on 3DMatch dataset using the default configuration and found the training process is very slow. Specifically it took about one and a half hour for one epoch. (as you mentioned in the paper, you trained FCGF for 100 epochs, which means more than one week in my configuration). The GPU memory it took is only less than 5000 MB and GPU utility is less than 10% but CPU utility is high. I wonder is it normal situation and what's the most time-consuming part ?but I use V100 to train the model. And also find the speed of training on GTX1080Ti is faster than it on a V100. In Issue#11, I could not find the solution, so can you provide another way to solve this problew

Thanks a lot.

ccyycc1994 commented 4 years ago

And also I try some methos in https://github.com/StanfordVL/MinkowskiEngine/issues/121, but it did not work as well

chrischoy commented 4 years ago

For V100 speed being slower than 1080ti, use export OMP_NUM_THREADS=20 or lower.

ccyycc1994 commented 4 years ago

@chrischoy if it is ok, how long did you need to train a epoch on 3dmatch dataset, I need 1.5 hours to train on GTX1080, is it slow or that's a common speed?

chrischoy commented 4 years ago

Yes, that is the usual speed.

The default argument uses batch size = 4, which uses a fraction of GPU. Try to increase the batch size.

Also, the codebase is not particularly optimized, but I think there are some parts that could be sped up significantly if you tune some hard negative mining parameters.

ccyycc1994 commented 4 years ago

@chrischoy before did you try some other PyTorch Spatially Sparse Convolution Library Like spconv(https://github.com/traveller59/spconv) or SparseConvNet(https://github.com/facebookresearch/SparseConvNet) , can this library speed up training procegress, thank you a lot

chrischoy commented 4 years ago

No I haven't. There are several poorly written parts in data loader that take up huge resources and one of them is https://github.com/chrischoy/FCGF/blob/master/lib/data_loaders.py#L257 which uses parallel KD trees to create a large set of indices and tend to hog CPU resources.

This is not really necessary to compute the loss since we can compute whether a correspondence is correct or not from the ground truth transformation.

I was planning to replace this part with on-the-fly loss computation, but I didn't have much time and I just left it there.

jingyibo123 commented 4 years ago

No I haven't. There are several poorly written parts in data loader that take up huge resources and one of them is https://github.com/chrischoy/FCGF/blob/master/lib/data_loaders.py#L257 which uses parallel KD trees to create a large set of indices and tend to hog CPU resources.

This is not really necessary to compute the loss since we can compute whether a correspondence is correct or not from the ground truth transformation.

I was planning to replace this part with on-the-fly loss computation, but I didn't have much time and I just left it there.

Hi Chris, I managed to rewrite the function generate_rand_negative_pairs using T_gt to remove correct correspondences,
however I haven't thought of how to generate Correct correspondences on the fly on GPU without using KDtree or KNN.. Any ideas @chrischoy ?