Open BunnyShan opened 6 years ago
I am curious about it. If training from scratch can get such a good performance, why the researchers use the pre-trained model? Just for converging fast? I have always been thinking that training from scratch is not good as pretrained in mAP.
I believe the term "from scratch" means using the pre-trained-imagenet/modified VGG but starting with random weights the SSD-modules.
I think it means vgg base layers "pre-trained" on Imagenet and finetune the ssd extra layers with detection datasets(like VOC, COCO...)
In the model perfomance comparison, the training from scratch means no imagenet pretrain?