Closed fjzpcmj closed 1 year ago
Hi,
Thanks. I fixed it with this line.
Regards, Yukang Chen
@yukang2017 Dear Author, Thanks for your reply. I am training detection model with config "nusc_centerpoint_voxelnet_0075voxel_fix_bn_z_largekernel3d_large,py" in two V100 GPUs. The conv type is 'spatialgroupconvv2'. It seems that it will cost 12days to train the model with 20 epochs. When trained with four V100 GPUs, the cost time is also 12 days. Is this normal?It will be very nice if you can share your traning logs.
here is my traing log with 2 GPUs: 2023-05-08 19:00:44,415 - INFO - Epoch [1/20][510/30895] lr: 0.00010, eta: 11 days, 17:20:45, time: 1.361, data_time: 0.121, transfer_time: 0.019, forward_time: 0.327, loss_parse_time: 0.001 memory: 7336,
here is my traing log with 4 GPUs: 2023-05-09 10:22:58,283 - INFO - Epoch [1/20][100/15448] lr: 0.00010, eta: 14 days, 20:59:34, time: 3.149, data_time: 0.254, transfer_time: 0.015, forward_time: 0.333, loss_parse_time: 0.000 memory: 6913,
https://github.com/dvlab-research/LargeKernel3D/blob/ca786a7a9fa6531db39da9b4eb0dc5149bbdc312/object-detection/det3d/models/backbones/scn_largekernel.py#L132 when runing detection with spatialgroupconvv2, there will be error of "NameError: name 'truncnormal' is not defined"