Why train the code so slow?

WeijingShi / Point-GNN

Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud, CVPR 2020.

MIT License

523 stars 114 forks source link

Why train the code so slow? #4

Closed happinesslz closed 4 years ago

happinesslz commented 4 years ago

Thanks for your nice code. I use tensorflow-gpu==1.15.0 to train your code on single Titan Xp GPU, but the GPU is occupied on 0% and the memory usage is only 158M? In addition, on GTX 1070, I also meet the same problem. Can you explain why? Thanks.

I follow your README and run the following command (for single gpu, and I also reduce the batch size and NUM_GPU): CUDA_VISIBLE_DEVICES=7 python train.py configs/car_auto_T3_train_train_config configs/car_auto_T2_train_config

WeijingShi commented 4 years ago

Could you check if the process is still running on the CPU side? In the current training code, the 1st epoch is slower as the dataloader loads the data the first time. It might take several minutes before the training on the GPU start.
Also, the training on train split takes around 9GB memory space, if the system runs out of memory, it might be swapping the memory to disk, which might slow down things a lot.