Closed chengcchn closed 4 years ago
@hirotomusiker
Hi thank you for using our repo. Yes evaluation usually takes time because the evaluator performs forward processes on ALL the validation images (e.g. 5k images for COCO). Please set eval_interval larger - more than 4000 for example. At iter=20 the model is not trained enough.
@hirotomusiker Thanks for your reply. But I am confused that the evaluation speed should have nothing to do with how well the model is trained. I have only made some modification aiming to support my dataset. Moreover, when I run the script "python train.py --cfg config/yolov3_custom_eval.cfg --eval_interval 1 --checkpoint checkpoints/snapshot20.ckpt", the evalution wasn't start.
OK please see if evaluation is going on by inserting a print line in the inference loop: https://github.com/DeNA/PyTorch_YOLOv3/blob/master/utils/cocoapi_evaluator.py#L62-L91 By default no logs appear during evaluation.
Yes, I have tried this and the logs appear. I can see the "test1.", "test2." appear continuously, but the speed is slow.
In my environment it's about 10 images / sec for image size = 608. How about you?
My server has been shut down, I will do the experiment again in the next two days and let you know the result. Thanks a lot!
I do the experiment again. I have added "print(A)" in the line 91 in the file "cocoapi_evaluator.py ", and the output is as follow: It seens that the evaluation process is processing, but it's about 8 secs / image for image size =416. However, I also test the coco evaluation, and the speed is fast like 14 images / sec for image size = 608. So is it the evaluation speed relevant to how well the model is trained?
Hi, @hirotomusiker. I redo the experiment, strictly check the json dataset I converted. I found that if I used some scripts like "voc2coco.py" to generate the json dataset, an error would occured like issue46 and issue50. I wrote the script myself and this error disappeared. Finally, I did the evaluation after 3000 iters and the evaluation speed is normal! So the evaluation speed is relevant to how well the model is trained. Don't know why.
Thanks for your repo again!
Thank you for reporting. Now I understand the issue. At the beginning of training, the output is random. Noisy boxes are detected everywhere in the image. The postprocess tries to perform non-max suppression (NMS) on ALL the boxes. That's why evaluation takes time. So the only solution is not to evaluate the model at the early stage of training. Thank you!
Thanks for sharing this repo! I tried to use this repo to train my dataset. When I solved the problems I met and started the training, I found the evaluation process is really slow. I set the
eval_interval
to be 20 for observing the evaluation, but it was just stuck in there. Looking forward to your reply, thanks!