AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.68k stars 7.96k forks source link

Dataset query when training YOLO v3 and avg loss in training not decreasing after many iterations #582

Open golars497 opened 6 years ago

golars497 commented 6 years ago

Hi,

I am training YOLO v3 (352 by 352 as this is the highest resolution my GPU allows at the moment) to detect a single class. I am training to detect knives. The knife would be positioned around 2 - 3 meters from the camera (1280 by 720). Since YOLO v3 is a lot better than YOLO v2 when it comes to picking up small objects would it be better if I make the dataset so that the knives in the input images aren't small but maybe less than 1 meter from the camera? or shall I go about the default way and position them like I would when I am detecting the knives in real life (2-3 meters)

Many Thanks

AlexeyAB commented 6 years ago

Hi,

If images from training and detection datasets have the same resolution (1280 by 720), then objects should have the same size, i.e. knives should be positioned 2-3 metters from camera.

golars497 commented 6 years ago

Ah, I had that feeling! thanks for the reply! :D

golars497 commented 6 years ago

@AlexeyAB

I wanted to ask something else as well, So basically, because I have access to a number of computers I am training 5 different models: YOLOv2 at 416 by 416px YOLOv2 at 544 by 544 px TINY-YOLOv2 at 448 by 448px TINY-YOLOv2 at 544 by 544ps YOLOv3 at 352 by 352 px

I am detecting 2 classes: knife and knives (when a bunch of knives are grouped). Actually I only want to detect 1 class but because during the construction of the data set I accidentally left a bunch of knives piled up somewhere in the input image - I decided I'd label them because I was worried that if I didn't label it, it may hinder the models ability to detect. As I say in my first comments, the knives are place 2-3 meters away from camera.

For positive samples I have roughly 2300 self-made images with knifes (most around 1280 by 720 px) and 800 close-up images of knives from COCO website (most around 500px by 375px).

As for negative samples, I have 100 self-made images (1280 by 720) and 3000 random images from 2007-2012 VOC dataset (mostly 500px by 375)

I am using the same dataset for all of models. Now, for what I would like to actually ask - I feel like the avg loss for Tiny-yolo 448 and 544 isn't getting any lower than 1.2 after many iterations and even at 20,000+ iterations I am getting some nans (I get some nans on all the models). I also, feel like the same might happen to YOLOv2 416 and 544. I wanted to know if this was expected? Should I just wait and keep training the models?

YOLOv2 at 416 by 416px yolov2-416

YOLOv2 at 544 by 544 px yolov2-544px

TINY-YOLOv2 at 448 by 448px tiny-yolo-448

TINY-YOLOv2 at 544 by 544px tiny-yolo-544px

YOLOv3 at 352 by 352 px yolov3-352px

The cfg files: phase-2-training-analysis-cfg.zip

I am currently using the darknet map function to get mAP against validation (dataset similar data to self made images) to see if the detector can actually still pick up knives. Side note: I haven't recalculated the anchors for any of these models I am just using the default ones that came from the repo

Sorry for pestering you and once again thank you for your help :)

AlexeyAB commented 6 years ago

Now, for what I would like to actually ask - I feel like the avg loss for Tiny-yolo 448 and 544 isn't getting any lower than 1.2 after many iterations and even at 20,000+ iterations I am getting some nans (I get some nans on all the models). I also, feel like the same might happen to YOLOv2 416 and 544. I wanted to know if this was expected? Should I just wait and keep training the models?

As I see avg loss for Tiny Yolo and Yolo is about 0.12 - 0.18 this is much lower than 1.2. Or what do you mean by 1.2?

0.12 is a good avg loss. Any avg loss that less than 1 is a good value.

Generally:

So better to calculate mAP to check whether the model is trained well.

golars497 commented 6 years ago

Ah yes! Sorry I did mean 0.12 not 1.2. I've used darknet map command on all the weights generated for tiny-yolo-544 (with detection resolution set to 544) and the highest mAP I get is around 8-9%. I will send the full text file of results when I get on a pc

Many thanks

golars497 commented 6 years ago

I did some more testing on TINY-YOLO-544. I took another video waving around a knife but in a well-lit area (compared to the one in my validation set which was kind of in the middle) and it works a lot better (Still not as robust as I wanted to and I didnt calculate mAP as I didnt have time to put annotations on the video frames). Which makes me realize that I still need more diversity with my training data if I want to improve detection. I trained it on 544px but it works well even on 512px or 608px

AlexeyAB commented 6 years ago

Good training data + also if you want to train with more variety of lighting, you can train with exposure = 2.5 or exposure = 4.0 https://github.com/AlexeyAB/darknet/blob/1b1221ace5f49b5b464cf7f4ff89f31cf03d3da1/cfg/yolov3.cfg#L15