AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.72k stars 7.96k forks source link

yolov3 #931

Open ssetty opened 6 years ago

ssetty commented 6 years ago

Hello, I am using AlexyAB implementation of darknet. I am trying to run gen_anchors gives exception Traceback (most recent call last): File "gen_anchors.py", line 165, in main(sys.argv) File "gen_anchors.py", line 142, in main w,h = line.split(' ')[3:]
ValueError: too many values to unpack

Command: python gen_anchors.py -filelist ../data/train.txt -output_dir ../anchor_dir/ -num_clusters 3

I am not clear how anchor, kmeans, IOU are related. Please explain.

What changes do we need to do for detection of small objects?

Thanks

ssetty commented 6 years ago

My train.txt is as below /home/aic_subscription/software/darknet/data/obj/BeanBall1.jpeg /home/aic_subscription/software/darknet/data/obj/BeanBall2.jpeg /home/aic_subscription/software/darknet/data/obj/DustBall.jpeg

In gen_anchors.py I see below lines

line = line.replace('JPEGImages','labels') line = line.replace('.jpg','.txt') line = line.replace('.png','.txt') line = line.replace('.jpeg','.txt')

What is the significance of this?

AlexeyAB commented 6 years ago
ssetty commented 6 years ago

Thanks AlexeyAB calc_anchor worked.

  1. How are kmeans and IOU used for anchor creation?
  2. I ran custom training without changing anchors (ran with default settings) - weights at 800 checkpoints was able to predict but further weights like 900,1000 failed to predict - why is it so?
  3. Is it possible to run darknet.exe test pro grammatically using api ? or I need to process spawn?
  4. How do I get bounding rectangle region coordinates, label, probability/confidence as response? As of now we get image with bounding rectangles.
  5. Ideally how many images per class are required? Can test and train be same images (I have less images)?
  6. For small objects you have mentioned change width/height to 608. Small object does it mean small region to be detected or size of overall image is small?
  7. Is it possible to detect small scratch or dent on metal surface? Here region to be detected is small. Should I set size/width - 608.
AlexeyAB commented 6 years ago
  1. anchors are average sizes of objects for each group (cluster) of objects. The closer anchors to the sizes of objects - the more IoU (indicator), so the more mAP (mean average precision).
  2. You should train more than 2000 iterations
  3. What do you mean?
  4. use -ext_output flag or what do you mean?
  5. you should preferably have 2000 images for each class or more
  6. Small object it means small region to be detected
  7. Yes, it is possible. What is the smallest relative size of object (relative to the image size), that you want to detect?
ssetty commented 6 years ago

Thanks AlexeyAB.

My observation (a) for custom training without any changes to anchors and width/height in cfg file, for some of images detection was done in 800 iteration weights file. But if I use 900 or 1000 iteration weights file for same image prediction is empty (no bounding boxes) but works with 800 iteration weights file. [but as you mentioned will run till 2000 iterations] (b) If I change anchors (after running calc_anchors) and width/height, I see no predictions even if avg loss < 1 (0.8). Same used to work with defaults with no changes in anchors and width/height.

  1. As of now we run darknet test from OS prompt, do we have any api wrapper - so that from python program we can invoke predictions?

  2. I will try with -ext_flag

  3. Regarding image size (700 600) but small scratch in image is (70 10) - so now I have set width/height - 608.

AlexeyAB commented 6 years ago

As of now we run darknet test from OS prompt, do we have any api wrapper - so that from python program we can invoke predictions?

set LIBSO=1 https://github.com/AlexeyAB/darknet/blob/17520296c730c7d7e2683452b11bf50fc8959688/Makefile#L7

And use this example: https://github.com/AlexeyAB/darknet/blob/master/darknet.py

ssetty commented 6 years ago

Hi Alexey,

(1) Python program for predictions - darknet.py -> does it work only on windows (because I see .dll)-> I need to perform predictions from linux machine.

(2) I ran custom training for full 2000 iterations (loss < 0.6), changed width/height, changed anchors - but I when I run darknet test -> no predictions are returned. Why is it so ?

Thanks

AlexeyAB commented 6 years ago
  1. Did you change width/height, anchors before or after training?

  2. How many classes?

  3. What mAP can you get?

ssetty commented 6 years ago
  1. Ok I will try with linux

  2. width/height and anchors are changed before training - I guess I am wrong only anchors should be changed before training and width/height should be changed after training during testing? I have set (since I got cuda out of memory) batch=64 subdivisions=32

  3. I have two (2) classes

  4. Avg IOU - 0.7, avg loss - 0.6

  5. In your documentation for small object detection you have mentioned set layers = -1, 11 + stopbackward=1 , Should I do this ?

  6. How do I get mAP? (mean average precision)
    I see below 2010: 0.609990, 0.637123 avg, 0.001000 rate, 25.160743 seconds, 128640 images

  7. Does darknet-yolo provide in-built augmentation like rotate/flip....?

Thanks

AlexeyAB commented 6 years ago

width/height and anchors are changed before training - I guess I am wrong only anchors should be changed before training and width/height should be changed after training during testing?

You can set high width/height before training. And you can set high width/height after training (but after training you shouldn't set higher than 2x of training width/height)

How do I get mAP? (mean average precision)

https://github.com/AlexeyAB/darknet#when-should-i-stop-training

./darknet detector map data/obj.data yolo-obj.cfg backup\yolo-obj_7000.weights

Does darknet-yolo provide in-built augmentation like rotate/flip....?

Horizontal flip - yes. Rotate - no.

In your documentation for small object detection you have mentioned set layers = -1, 11 + stopbackward=1 , Should I do this ?

stopbackward=1 isn't related to the small objects - it decreases accuracy.

https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

to speedup training (with decreasing detection accuracy) do Fine-Tuning instead of Transfer-Learning, set param stopbackward=1 in one of the penultimate convolutional layers before the 1-st [yolo]-layer, for example here: https://github.com/AlexeyAB/darknet/blob/0039fd26786ab5f71d5af725fc18b3f521e7acfd/cfg/yolov3.cfg#L598


https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

set layers = -1, 11 and stride=4 to detect small objects

for training for small objects - set layers = -1, 11 instead of https://github.com/AlexeyAB/darknet/blob/6390a5a2ab61a0bdf6f1a9a6b4a739c16b36e0d7/cfg/yolov3.cfg#L720 and set stride=4 instead of https://github.com/AlexeyAB/darknet/blob/6390a5a2ab61a0bdf6f1a9a6b4a739c16b36e0d7/cfg/yolov3.cfg#L717

ssetty commented 6 years ago

Hi Alex, map -> mean average precision (mAP) = 0.835825, or 83.58 %

Before training I changed width/height and anchors -> but testing generated no bounding box, mAP is 0.8. Now I need to run again - should I revert back to defaults?

So horizontal_flip is handled by darknet framework I need not change any settings. correct?

Thanks

AlexeyAB commented 6 years ago

So horizontal_flip is handled by darknet framework I need not change any settings. correct?

Yes, if you want to use it. But if you want to disable horizontal_flip then set flip=0 in the [net] section.

Before training I changed width/height and anchors -> but testing generated no bounding box, mAP is 0.8. Now I need to run again - should I revert back to defaults?

ssetty commented 6 years ago
  1. Yes I got mAP=0.8 after training with changed width/height and anchors. But during testing no regions detected.
  2. No as of now I have not set layers = -1, 11 and stride=4. Should I do this and re-train?

Thanks

AlexeyAB commented 6 years ago

Show full output or screenshot for detecor map command.

ssetty commented 6 years ago

map.txt

Attached is map output.

Total Detection Time: 16.000000 Seconds

seen 32 detections_count = 344, unique_truth_count = 252
rank = 0 of ranks = 344 rank = 100 of ranks = 344 rank = 200 of ranks = 344 rank = 300 of ranks = 344 class_id = 0, name = dent, ap = 87.00 % class_id = 1, name = crack, ap = 80.16 % for thresh = 0.25, precision = 0.95, recall = 0.81, F1-score = 0.87 for thresh = 0.25, TP = 203, FP = 10, FN = 49, average IoU = 73.98 %

mean average precision (mAP) = 0.835825, or 83.58 %

Regards

AlexeyAB commented 6 years ago

for thresh = 0.25, TP = 203, FP = 10, FN = 49, average IoU = 73.98 %

mean average precision (mAP) = 0.835825, or 83.58 %

So it is a good mAP, also you can detect 203 object of 252 objects on your Training dataset if used -thresh 0.25. Can you detect any objects on several images from training dataset data/train.txt by using such command? ./darknet detector test data/obj.data yolo-obj.cfg yolo-obj.weights -dont_show -ext_output < data/train.txt > result.txt

What can you see in the result.txt?

Before training I changed width/height and anchors -> but testing generated no bounding box, mAP is 0.8.

Do you try to detect objects on the images that are not from the train or valid datasets?

ssetty commented 6 years ago
  1. Can you detect any objects on several images from training dataset data/train.txt by using such command? Result.txt -> I can see bounding box predicted Enter Image Path: Photos_1_1527573842636.jpeg: Predicted in 0.128984 seconds. torn_crack: 65% (left: 554 top: 304 w: 78 h: 172) torn_crack: 87% (left: 611 top: 317 w: 51 h: 169) Bounding Box: Left=554, Top=303, Right=632, Bottom=476 Bounding Box: Left=611, Top=317, Right=662, Bottom=485

  2. If it is new data images not from train or validation set then NO bounding box predicted. Has some over fitting happened? Is it required every class have equal number of data points?

Thanks

AlexeyAB commented 6 years ago

I think this rule is broken: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

General rule - you should keep relative size of objects in the Training and Testing datasets roughly the same:

train_network_width * train_obj_width / train_image_width ~= detection_network_width * detection_obj_width / detection_image_width

train_network_height * train_obj_height / train_image_height ~= detection_network_height * detection_obj_height / detection_image_height

ssetty commented 6 years ago

I am not sure if this is case. Please give me example of "train_network_width train_obj_width / train_image_width ~= detection_network_width detection_obj_width / detection_image_width"

I tried resizing test image to 608 * 608, but still no prediction.

Is it because I am training with only 600 images which is less and over-fitting. (300 image + 300 due to augmentation - horizontal flip=true)

Thanks

AlexeyAB commented 6 years ago

Example: https://github.com/AlexeyAB/darknet/issues/977#issuecomment-394319810

What image size in training dataset? What image size of new data images not from train or validation set?

ssetty commented 6 years ago

(1) What image size in training dataset? Images are of varying sizes like 640 480, 1306 979, 1228 921. Should I resize them to fixed size (say 800 600 or 640 360 or 300 300) and then run BBox-Label-Tool ? Do you suggest any best fixed size to resize?

(2) image size of new data not from train/validation Like 800 443 or 323 207 Should I resize both train /test to fixed size ?

Regarding formula "train_network_width train_obj_width / train_image_width ~= detection_network_width detection_obj_width / detection_image_width"

train_network_width = 608 train_obj_width = Is this actual width of image? train_image_width = what is this?

Thanks

AlexeyAB commented 6 years ago

@ssetty

No, you shouldn't resize your image.

Your Training dataset should include the same [/2, x2] Relative sizes of objects that you want to detect.


For example, in cfg file width=608,

  1. You train on image 1920x1080, and size of object on this image is 200x100, so

train_network_width = 608 train_obj_width = 200 train_image_width = 1920

train_network_width * train_obj_width / train_image_width = 608*200/1920 = 63

  1. Then you want to test it on another image 300x200, where size of object is 150x100, so

detection_network_width = 608 detection_obj_width = 150 detection_image_width = 300

detection_network_width * detection_obj_width / detection_image_width = 608*150/300 = 304

63 is very different from 304, more than 2x times.

ssetty commented 6 years ago
  1. OK Regarding terminologies please correct if I am wrong

train_network_width = [cfg file width] train_obj_width = [width of object to be detected within whole image] train_image_width = [width of image within which object needs to be detected]

train_network_height = [cfg file height] train_obj_height = [height of object to be detected within whole image] train_image_height = [height of image within which object needs to be detected]

Please confirm my understanding is correct

But problem here is raw vehicle images are of various sizes and any cracks or dents to be detected within them are also of various sizes, that is reason I was saying to resize all raw images to fix size (train and test) and then mark regions for object detection (which also vary because cracks/dents vary in size) during training.

Thanks

AlexeyAB commented 6 years ago
  1. Yes. All sizes is measured in pixels.

that is reason I was saying to resize all raw images to fix size (train and test)

All images will be resized automatically to the network size (width=416 height=416) during Training or Detection.

ssetty commented 6 years ago

Thanks. So even if source images are of different size and objects to be detected within images can also be of different size, what needs to be done for detection to happen during testing (data not in train/validation)? Since I had less number of images I kept train/validation same - hope this is not issue.

Regards

AlexeyAB commented 6 years ago

what needs to be done for detection to happen during testing (data not in train/validation)?

Your Training dataset should include the same [/2, x2] Relative sizes (width_of_object / width_of_image) of objects that you want to detect. I.e. your training dataset should include such a set of relative sizes of objects that you want to detect - differing by no more than 2 times.

ssetty commented 6 years ago
  1. I had no other option but to continue with images/objects to be detected of varying sizes. I decreased thresh to 0.2 some objects are detected . Any other option?

  2. https://github.com/AlexeyAB/darknet/blob/master/darknet.py For programmatic way of prediction on linux using python - does it require any .so file (darket.so) ? How do I generate this ?

  3. Does object detection distinguish between glare,reflection,shadow and an actual scratch on glass surface?

Thanks

AlexeyAB commented 6 years ago
  1. Add more images https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

  2. Yes, build with LIBSO=1 in Makefile https://github.com/AlexeyAB/darknet#how-to-compile-on-linux

  3. If you teach neural network to distinguish (if you will provide all these images with glare,reflection,shadow ), then it will distinguish.

ssetty commented 6 years ago

Hi Alex,

  1. Can we use BBox_Lable_Tool instead of YOLO-MARK? https://github.com/puzzledqs/BBox-Label-Tool

  2. General rule - your training dataset should include such a set of relative sizes of objects that you want to detect - differing by no more than 2 times:

    In my case above rule does not hold (as mentioned earlier) - images/objects size to be detected vary. So if I keep all your setting default (width/height - 416, default anchor values, no changes in set layers values - will this be fine? Because defaults for me runs fast with less GPU time (as network size is small - 416)

  3. In guidelines you have mentioned - Desirable that your training dataset include images with non-labeled objects that you do not want to detect - negative samples without bounded box (empty .txt files)

Please clarify. Won't an empty file give error while running darknet training?

Thanks

AlexeyAB commented 6 years ago

@ssetty Hi,

  1. Yes, you can use Bbox-tool, but after labeling - you should check your dataset by using Yolo_mark.

  2. Yes, you can keep setting by default, but your training and test datasets should be very similar.

  3. Empty txt-file willn't give any error for Yolo v2/v3.

ssetty commented 6 years ago

I have created darknet.so as per your instructions. However while executing darket.py (for testing) I receive below error

Traceback (most recent call last): File "darknet.py", line 122, in lib = CDLL("./darknet.so", RTLD_GLOBAL) File "/home/aic_subscription/anaconda3/lib/python3.6/ctypes/init.py", line 348, in init self._handle = _dlopen(self._name, mode) OSError: /usr/lib/libgdal.so.20: undefined symbol: sqlite3_column_table_name

I am using Python 3.6.3 :: Anaconda, Inc.

Thanks

AlexeyAB commented 6 years ago

@ssetty

ssetty commented 6 years ago

did you compile Darknet with LIBSO=1 in Makefile? yes

Do you have darknet.so file near with darknet.py? yes

What of these 2 files did you run?

on Linux: run darknet/darknet.py I am running Linux version

on Windows: run darknet/build/darknet/x64/darknet.py

ssetty commented 6 years ago

Hi Alex any suggestion for above issue?

AlexeyAB commented 6 years ago

OSError: /usr/lib/libgdal.so.20: undefined symbol: sqlite3_column_table_name

As I see your Python try to call sqlite3 function and can't find it.

Try to re-install Python, or install sqlite3: https://docs.djangoproject.com/en/2.0/ref/contrib/gis/install/spatialite/#installing-spatialite

Something related to this: https://stackoverflow.com/questions/49944614/oserror-usr-lib-libgdal-so-20-undefined-symbol-sqlite3-column-table-name

ssetty commented 6 years ago

I am not able to install sqlite3 - gives same error.

Else during prediction every time I need to spawn new process (invoke command line darknet detector. Any option to include folder name (containing multiple images to be predicted?)

I found default settings in cfg / default anchors gives better results (because all my images are of various sizes, detection sizes of various sizes). For training kept size 416 and during prediction size 608. During testing should I set batch and subdivision=1?

Thanks

AlexeyAB commented 6 years ago

During testing should I set batch and subdivision=1?

It isn't necessary in this repository. But you should do this if you use original pjreddie repo.

ssetty commented 6 years ago

Hi Alex, What changes needs to be done if I invoke predictions from command line -> I want class name, probability, bounding box coordinates.

I need to plot loss functions is it possible to get using weights/checkpoints (for instance accuracy - epochs vs loss) - in this case this will be avg loss or avg IOU

Thanks

ssetty commented 6 years ago

Hi Alex,

In your performance improvement guidelines you have mentioned about negative samples...

"desirable that your training dataset include images with non-labeled objects that you do not want to detect - negative samples without bounded box (empty .txt files) - use as many images of negative samples as there are images with objects"

Is below correct 1 0.0 0.0 0.0 0.0

That means if I have for class 1 100 positive samples (without zeros) I need to have for class 1 100 negative samples ( with zeros) is this correct ?

Thanks