Question on 1-st training

sun-yue2002 commented 7 months ago

Hi!

https://github.com/SxJyJay/MSMDFusion/blame/7b5b2741e693ba8007c95e3e8951e4e67fbc47ed/configs/transfusion_nusc_voxel_L.py#L244-L244C10

In this line, you mention that the value of samples_per_gpu is 2, but in the previous part this value is set to 4. I am a little confused by this.
When I set the workers_per_gpu to 0 as the code does, I find that the utilization rate of my gpus (8 RTX 4090) is low and it must spend 14 days to train the model, is that nomal?
Due to the second problem, I set workers_per_gpu to 16 to make better use of my gpus. Besides, the value of samples_per_gpu is 4. After training,I get the following result: pts_bbox_NuScenes/NDS: 0.6944, pts_bbox_NuScenes/mAP: 0.6464 which is a little bit lower than yours. Is this normal？If not, is that because of the change of workers_per_gpu?

Hope for yous reply! Best

SxJyJay commented 7 months ago

I remember that setting samples_per_gpu to 2 can probably improve the performance a little bit. And workers_per_gpu determines the number of workers during fetching data, thus it does not affect model performances.

sun-yue2002 commented 7 months ago

I remember that setting samples_per_gpu to 2 can probably improve the performance a little bit. And workers_per_gpu determines the number of workers during fetching data, thus it does not affect model performances.

Got it! Thanks a lot!

SxJyJay / MSMDFusion

Question on 1-st training #35