NVIDIA-AI-IOT / tao_toolkit_recipes

Other
32 stars 15 forks source link

Object detection highloss #9

Open jdaviddx opened 2 years ago

jdaviddx commented 2 years ago

Hello, I try to train a object-detection yolov4 on coco dataset(half of the classes). But after 100 epochs (like 30 hours on 4 v100) the loss is 220 and the mAP ~0.31. Is this a problem or should I wait more?

Tyler-D commented 2 years ago

Hi @jdaviddx , Here is the per-epoch log when I trained YOLOV4 on coco with full 80 classes (Evaluation mode is in SAMPLE mode, so the mAP is slightly worse than INTEGRATE mode). yolov4_training_log_cspdarknet53.csv

Back then, I trained with 1 GPU. Could you also try with 1 gpu to help narrow down the issue ? Thanks !

jdaviddx commented 2 years ago

Hello, I downloaded a pretrained model from ngc, for resnet18. And I tried to train the yolov4 on COCO. But the starting loss was like 2 milion, and after 30 epocchs loss dropped to 300. Do you use any backbones from ngc?

Tyler-D commented 2 years ago

No. For SOTA training, we use the imagenet pretrained cspdarknet53. The pretrained models on NGC are trained on Openimage