Details about training code

themathgeek13 commented 6 years ago

Hi @BobLiu20, thanks for your excellent code! Can you please provide more details about training? Can it be done without the pretrained weights? How long would it take (how many epochs and with what batch size)? I am not able to get a graph like the one you have shown, the loss is only oscillating around the initial value but not reducing.

themathgeek13 commented 6 years ago

Couple of updates based on my own experiments (may be useful for future users):

Using the pretrained weights and a batch size of 8 on the AWS p2.xlarge instances (NVIDIA K80 GPU), the losses started around 0.7 or 0.8, and dropped to around 0.4-0.5 after nearly 2500-3000 steps. These are rough figures but should give some idea of what to expect. The mAP calculated by the eval.py script was around 0.25-0.3 at this stage. I expect that this will increase and loss will reduce to 0.1-0.2 as per the graph in the README, as the number of steps increases.
Without pretrained weights - have not yet tried this, will update as soon as I do this.
I have been trying to implement a binarized version of YOLO, but was unable to get the loss to converge beyond 0.7. It oscillates between 0.7 and 0.9, sometimes jumping beyond 1. mAP also oscillates between 0.07 and 0.12. This same binarization worked for MNIST, CIFAR-10 and ImageNet. Suggestions for this are appreciated.

BobLiu20 commented 6 years ago

Hi @themathgeek13 . Thanks for your detail experiments. I would like to share my experiments.
Using the newest code with commit 2ca525a and using default parameter. Training in 4 TITAN X GPUS. The mAP reach to >58% after ~10 epochs and loss is ~0.2.
Anyway, If you want to reach >60 mAP, you should adjust lr and loss weights(see lambda in yolo_loss.py) in carefully. BTW, I had added data augmentation and improve loss layer in the newest code. Please update it.
Any issue please let me know. Thanks.

themathgeek13 commented 6 years ago

Thanks for these updates, it will be very useful. Is this training for 58% mAP is from scratch or with pretrained weights?

BobLiu20 commented 6 years ago

@themathgeek13 Hi, I had using imagenet pretrained weights in this case. I will try to training from scratch and share you the result. But as you know it will take long time if training from scratch.

themathgeek13 commented 6 years ago

Oh that's all right, I just wanted to know because if I modify the network for binarization, I would need to train from scratch since the pretrained weights cannot be used directly. No problem, I will test this myself :+1:

dionysos4 commented 6 years ago

Hi @BobLiu20, I have also a short question about your training. Did you freeze the basenetwork weights (pretrained feature extractor network) at training or did you train the whole network?

BobLiu20 commented 6 years ago

@dionysos4 it's not freeze backbone weights. But using different lr. You can see it in params.py file.

initmaks commented 6 years ago

Hey @themathgeek13, any luck training the net from scratch?

bhargavajs07 commented 6 years ago

Did you try training the darknet_53 model using Adam ? Curiously, I notice the convergence is a lot worser than what it is using SGD... this is contrary to my expectations. I hoped Adam to work better due to adaptation of learning rates. I am wondering what might be the cause...

(Orange/Gray plots are for SGD and Green plot is obtained using Adam)

adamvssgd convergence

Let me know if you observed same phenonmenon and already know what the reason might be.

ThorinGondor commented 6 years ago

@bhargavajs07 Hi, do you trained the net on COCO dataset? Did u succeed in testing your trained model? My trained model had no detections.

zhaoyang10 commented 5 years ago

@themathgeek13 Hi, have you got a good result by training from scratch? I train COCO from scratch for 100 epoches, but can only get very low mAP which is about 0.10. I'm really confused.

zhaoyang10 commented 5 years ago

@themathgeek13 with eval_coco.py, it is even lower. Following is the results.

BCWang93 commented 5 years ago

2. Without pretrained weights - have not yet tried this, will update as soon as I do this.

Hi,I have some problem recently,I use the code to train in the voc dataset,but when I test the trained model use image,the output is none!There was no detected bbox in the image,it is only have origin image!Do you know how to solve this problem?Thank you very much!

BobLiu20 / YOLOv3_PyTorch

Details about training code #3