AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.64k stars 7.96k forks source link

Improving Training (Using Negatives or Adjusting the Anchors) #1254

Open tsender opened 6 years ago

tsender commented 6 years ago

Hello,

  1. I am quite new to using ML algorithms. I am on a student project team at my university and we use a NVIDIA Jetson TX1 for object detection in our robotics competition, and we have been training with our own data set. When I found out about training with negatives, I began adding them into our dataset to reduce the number of false positives. However, I have yet to come across documentation that TRULY explains how to train with negatives.

I understand that all images have a .txt file associated with them consisting of 5 numbers:

  1. Class ID
  2. x_center/frame_width
  3. y_center/frame_width
  4. box_width/frame_width
  5. box_height/frame_height

I have read that if I want to train with negatives then I simply need an empty .txt file instead of one as described above with actual data in it. When I run my MATLAB script that reads in the video footage and .mat object with all of the bbox info in it, I am able to save images to act as negatives and have it save empty .txt files as well.

My question is since darknet also needs a .data file indicating the number of classes, .txt files with all training and validation data, etc. am I also supposed to put these negative images in this list as well? That is, are my two final training and validation .txt files supposed to contain EVERY image that I want darknet to look over while it trains (images with actual objects AND negatives)?

  1. Building on top of this, how exactly does darknet pick images from these files as it trains? For instance, if my data set has 11 classes in it, and each class has between 5 and 20 thousand images, then how do I know how many images it has looked at so far from each class? And how does it cycle through everything? I am just inquiring so I can get a better understanding as I look through the training output in the terminal.

  2. Additionally, when I have tried to recalculate the anchors, I get an assertion error saying that N >= K for k-means. Is this because my .txt files also contain negatives (which have empty .txt files in them), or is something else that would cause this error? Also, as it read the images, it outputs it can't read any labels and this this only normal for the MSCOC data set (or something like that - I can't remember since I am on Mac and not on Ubuntu right now). Any ideas on how to fix this?

  3. I would really like for yolo to detect objects that are both small and large (relative to the camera frame). I know from having read previous posts and the main README that a general rule must be met regarding the training network size, object size, and frame size with the detection network size, object size, and frame size. Is there any way to make yolo detect objects of all sizes? And why must this rule be met in the first place? I simply would like to know more about how I can improve our training.

Any suggestions or help is greatly appreciated. Thanks.

AlexeyAB commented 6 years ago

@tsender Hi,

  1. Yes, your train.txt and valid.txt files should contain all images (with objects and without objects(negative samples)).

  2. Darknet looks to different images in random order. The number of images which were processed is equal to batch * iteration_number, where is batch= in cfg-file, and iterations_number - you see it during training.

  3. Can you show screenshot of this error?

  4. No one modern neural network isn't scale invariant. So to detect objects of all sizes - you should create training dataset that contains objects of all sizes.

tsender commented 6 years ago

@AlexeyAB Sorry for the late reply,

When I try to calculate the anchors with the following command: ./darknet detector calc_anchors cfg/yolov3-tiny_RoboSub18.data -num_of_clusters 9 -width 416 -height 416 -show

I get this error output: Can't open label file. (This can be normal only if you use MSCOCO) Can't open label file. (This can be normal only if you use MSCOCO) Can't open label file. (This can be normal only if you use MSCOCO) Can't open label file. (This can be normal only if you use MSCOCO) Can't open label file. (This can be normal only if you use MSCOCO) Can't open label file. (This can be normal only if you use MSCOCO) Can't open label file. (This can be normal only if you use MSCOCO) Can't open label file. (This can be normal only if you use MSCOCO)

all loaded.

OpenCV Error: Assertion failed (N >= K) in kmeans, file /tmp/binarydeb/ros-kinetic-opencv3-3.3.1/modules/core/src/kmeans.cpp, line 244 terminate called after throwing an instance of 'cv::Exception' what(): /tmp/binarydeb/ros-kinetic-opencv3-3.3.1/modules/core/src/kmeans.cpp:244: error: (-215) N >= K in function kmeans

calculating k-means++ ...Aborted (core dumped)

I have never used the k-means function before, so I don't fully know what this means.

Also, how exactly would I know if it's necessary to recalculate anchors? In your README you mentioned you can change the frame height and width in the [net] layer of the config file to detect smaller objects. I assume if I were to do this then it would be necessary to recalculate anchors?

Thanks.

AlexeyAB commented 6 years ago

@tsender

Can't open label file. (This can be normal only if you use MSCOCO)

Some of labels txt-files are absent in your dataset. Check your dataset by using Yolo_mark: https://github.com/AlexeyAB/Yolo_mark What dataset do you use?

I assume if I were to do this then it would be necessary to recalculate anchors?

Only if you greatly increase resolution.