Dear Author, I am training detection model with config "nusc_centerpoint_voxelnet_0075voxel_fix_bn_z_largekernel3d_large,py" in two V100 GPUs. The conv type is 'spatialgroupconvv2'. It seems that it will cost 12days to train the model with 20 epochs. When trained with four V100 GPUs, the cost time is also 12 days. Is this normal?It will be very nice if you can share your traning logs or share your training time and gpu numbers.
here is my traing log with 2 GPUs:
2023-05-08 19:00:44,415 - INFO - Epoch [1/20][510/30895] lr: 0.00010, eta: 11 days, 17:20:45, time: 1.361, data_time: 0.121, transfer_time: 0.019, forward_time: 0.327, loss_parse_time: 0.001 memory: 7336,
here is my traing log with 4 GPUs:
2023-05-09 10:22:58,283 - INFO - Epoch [1/20][100/15448] lr: 0.00010, eta: 14 days, 20:59:34, time: 3.149, data_time: 0.254, transfer_time: 0.015, forward_time: 0.333, loss_parse_time: 0.000 memory: 6913,
Dear Author, I am training detection model with config "nusc_centerpoint_voxelnet_0075voxel_fix_bn_z_largekernel3d_large,py" in two V100 GPUs. The conv type is 'spatialgroupconvv2'. It seems that it will cost 12days to train the model with 20 epochs. When trained with four V100 GPUs, the cost time is also 12 days. Is this normal?It will be very nice if you can share your traning logs or share your training time and gpu numbers.
here is my traing log with 2 GPUs: 2023-05-08 19:00:44,415 - INFO - Epoch [1/20][510/30895] lr: 0.00010, eta: 11 days, 17:20:45, time: 1.361, data_time: 0.121, transfer_time: 0.019, forward_time: 0.327, loss_parse_time: 0.001 memory: 7336,
here is my traing log with 4 GPUs: 2023-05-09 10:22:58,283 - INFO - Epoch [1/20][100/15448] lr: 0.00010, eta: 14 days, 20:59:34, time: 3.149, data_time: 0.254, transfer_time: 0.015, forward_time: 0.333, loss_parse_time: 0.000 memory: 6913,