Questions about Training Time and Test Latency

FishAndWasabi / YOLO-MS

YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-Time Object Detection

Other

222 stars 23 forks source link

Questions about Training Time and Test Latency #22

Open leonnil opened 6 months ago

leonnil commented 6 months ago

Hi, I'm attempting to reproduce YOLO-MS-XS (with SE attn). It shows that training from scratch (300 epochs) will take almost 10 days. I'm using RTX 3090 * 8. The training command, CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash tools/dist_train.sh configs/yoloms/yoloms-xs-se_syncbn_fast_8xb8-300e_coco.py 8, aligns with the README file.

Is it common for training to take this long? Could you share your training time? btw, which GPU device are you using to test the model's latency?

sparshgarg23 commented 5 months ago

Same here on single gpu A100 training takes about 5 days.

FishAndWasabi commented 4 months ago

Sorry for the delay. It takes us about 1 day to train YOLO-MS-XS (w/ SE attention) for 300 epochs. The training time issue may be caused by the version mismatch between mmcv and pytorch. We recommend using the mmcv==2.0.0rc4, pytorch==1.12.1, and cuda==11.6. Notable that the mmcv may be recompiled.

Thanks for your interest in our work! Best Wishes! 😊

sparshgarg23 commented 4 months ago

I am actually using the latest versions ,could it be a batch size issue as I have had to set the batch size to fit ithe gpu memory.

FishAndWasabi commented 4 months ago

The batch size we used is 32. 4 images per GPU and 8 GPUs. The GPU is 3090.

1717080080298

FishAndWasabi commented 4 months ago

We have met this issue before, but we forget the details. We remember we solved the problem by downgrading the version of pytorch.