Closed LiewFeng closed 1 year ago
Hi, can I ask how much GPU memory can afford the training of this model? I need to evaluate if my GPU memory is enough to try it.
@LiewFeng
@sunnyHelen ~18G.
Ok. Thanks a lot~
@LiewFeng That's very strange. Can you provide more details of your training? For example, what commond do you run, the batch size, and how much time is spent in both cases .
Hi, @Cc-Hy .Sorry for the late reply. The command is the same as that provided by the GETTING_STARTED.md. I didn't modify the batch size. For 1 GPU setting, it takes about 10 mins for the first epoch. It should take 10 hours for 60-epoch training. However, it only takes 5 hours, which is really strange. For 2 GPU setting, it takes about 6 mins for the first epoch. It should take 6 hours for 60-epoch training. It only takes 6hours, which is normal. Another phenomenon is that the cpu utilization of 1 GPU setting is high, while that of 2 GPU setting is really low.
Experiments are conducted on kitti train.
@LiewFeng Hi, it seems your 2 GPU training time is close to mine. It takes ~ 6 minutes for each epoch and I use 2 NVIDIA GeForce RTX 3090. And it takes ~ 12 minutes for each epoch when I use one GPU.
So I think your 2 GPU training time is normal. But if your GPUs are really working at very low utilization, you may check your CPU status. I once met this situation where my CPU was suffering from a bottleneck and the GPU could not work fully.
Hi, @Cc-Hy . I figure it out. The reason is the version of pytorch. When I run the experiment with 1 GPU, the pytorch version is 1.10. When I try to run with 2 GPUs, it gets stuck. Then I turn to pytorch 1.8 and it can work, but 2x slower. I am using A 100. It's about 2x faster than 3090. I still get stuck with 2GPU. It seems it's solved in OpenPCDet. Sadly, it doesn't work for me.
Hi, @Cc-Hy .When I train the model on kitti train, 2 GPUs takes more time than 1 GPU, which is really strange. Do you encounter this pthenomenon?