Open leonnil opened 6 months ago
Same here on single gpu A100 training takes about 5 days.
Sorry for the delay. It takes us about 1 day to train YOLO-MS-XS (w/ SE attention) for 300 epochs. The training time issue may be caused by the version mismatch between mmcv and pytorch. We recommend using the mmcv==2.0.0rc4, pytorch==1.12.1, and cuda==11.6. Notable that the mmcv may be recompiled.
Thanks for your interest in our work! Best Wishes! 😊
I am actually using the latest versions ,could it be a batch size issue as I have had to set the batch size to fit ithe gpu memory.
The batch size we used is 32. 4 images per GPU and 8 GPUs. The GPU is 3090.
We have met this issue before, but we forget the details. We remember we solved the problem by downgrading the version of pytorch.
Hi, I'm attempting to reproduce YOLO-MS-XS (with SE attn). It shows that training from scratch (300 epochs) will take almost 10 days. I'm using RTX 3090 * 8. The training command,
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash tools/dist_train.sh configs/yoloms/yoloms-xs-se_syncbn_fast_8xb8-300e_coco.py 8
, aligns with the README file.Is it common for training to take this long? Could you share your training time? btw, which GPU device are you using to test the model's latency?