WangYueFt / detr3d

MIT License
745 stars 139 forks source link

A question about GPU-util and training time(detr3D) #32

Closed Etah0409 closed 2 years ago

Etah0409 commented 2 years ago

Hi, I'm a trying to train detr3D on Nuscenes, but I meet a puzzled training procedure. It's strange that the training time will continue some weeks, and I find that the GPU-util is strange,too. I use 4 RTX A6000 to train the model, sometimes one or two GPUs' util will be 0(for a long time). After check, I can train fcos3d as usual. Could anyone please help me with the question? A lot of thanks! 2 ~QYQ3R9 E}}717MG69YZ(E

Etah0409 commented 2 years ago

Also, I try many times, and get a more strange training time.. ZH5)TT7YFK_3JZH9X_P5J6A .

Etah0409 commented 2 years ago

I guess I've known the reason. After I tried to change detr3D to detrMono3D, this problem was fixed. May it is caused by multi-data reading preprocess. Cause at the same time, somebody else (we work on the same server) was also using the I/O frequently. I didn't calculate how much memory multi-images preprocess needs, that's all I guess :) If anybody else have the other idea, we can discuss it together.

a1600012888 commented 2 years ago

Thanks. I don't know why your training time is so strange. Typically if data time is high, then GPU utlization is low, because the GPU is constantly waitting for the data preprocessing/reading.

Etah0409 commented 2 years ago

Thanks. I don't know why your training time is so strange. Typically if data time is high, then GPU utlization is low, because the GPU is constantly waitting for the data preprocessing/reading.

Many thanks for your kind! After I asked that guy to suspend his process, the training time returned to normal. It's the first time I try to train on mutli-view images, I'm amazed that 6 images a batch will cause this problem while 2 images a batch won't. Thanks again : )