dvlab-research / LargeKernel3D

LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs (CVPR 2023)
https://arxiv.org/abs/2206.10555
Apache License 2.0
188 stars 8 forks source link

About training time #8

Closed fjzpcmj closed 1 year ago

fjzpcmj commented 1 year ago

Dear Author, I am training detection model with config "nusc_centerpoint_voxelnet_0075voxel_fix_bn_z_largekernel3d_large,py" in two V100 GPUs. The conv type is 'spatialgroupconvv2'. It seems that it will cost 12days to train the model with 20 epochs. When trained with four V100 GPUs, the cost time is also 12 days. Is this normal?It will be very nice if you can share your traning logs or share your training time and gpu numbers.

here is my traing log with 2 GPUs: 2023-05-08 19:00:44,415 - INFO - Epoch [1/20][510/30895] lr: 0.00010, eta: 11 days, 17:20:45, time: 1.361, data_time: 0.121, transfer_time: 0.019, forward_time: 0.327, loss_parse_time: 0.001 memory: 7336,

here is my traing log with 4 GPUs: 2023-05-09 10:22:58,283 - INFO - Epoch [1/20][100/15448] lr: 0.00010, eta: 14 days, 20:59:34, time: 3.149, data_time: 0.254, transfer_time: 0.015, forward_time: 0.333, loss_parse_time: 0.000 memory: 6913,

yukang2017 commented 1 year ago

Hi,

I never tried on 2 GPUs. I am not sure about it. I previously used about 5-6 days to train on 4 GPUs.

Regards, Yukang Chen