Training configuration and get drawing images

JonnySme commented 5 years ago

Hello!

Faced problems with learning on yolov3. I want to train tiny yolov3 weights. Tried to use "yolov3-tiny_3l.cfg", avg did not fall below 2.7 to 37000 iterations. I tried the configuration file "yolov3-tiny_xnor.cfg", avg did not fall below 4 to 35,000 iterations. I use the latest version from your repository. How to train a good tiny model for detection
I tried using the original "save_image" from the image.c file to save images after detection (with painted boxes), noticed that turning on save_image reduces the FPS by 3-4 frames, how can this be solved? Or maybe there is another way to get an output image with painted boxes?

Regards, Jonny.

AlexeyAB commented 5 years ago

@JonnySme Hi,

avg loss = 2.7 - 4 can be a good loss for difficult datasets.
Did you applied it for video? There can be a bottleneck on your slow HDD.

JonnySme commented 5 years ago

Thanks for your answer!

I used the "map" checked, unfortunately some classes still show 0.87%, some 20%, more than 50% none of the classes show. In the .cfg files I set only the width and height of 608x608, classes and filters, respectively. In my database there are both small objects and large ones, what settings for tiny head can you recommend? I work on TX2
Yes, in a cycle I save each output image with the drawn boxes. using save_image from image.c file. FPS starts to fall at the time of execution "image copy = copy_image (p);" on 1 frame. 4 frames drops when this cycle starts: for(y = 0; y < p.h; ++y){ for(x = 0; x < p.w; ++x){ for(k= 0; k < p.c; ++k){ disp->imageData[ystep + xp.c + k] = (unsigned char)(get_pixel(copy,x,y,k)*255); } } }

AlexeyAB commented 5 years ago

Can you show your detector map output? And Show cloud of point that you can get by using calc_anchors command with flag -show
Next week, I will optimize work with many image functions.

JonnySme commented 5 years ago

@AlexeyAB

I answer 1 question. detector map (37000 iterations, yolov3-tiny_3l.cfg, uses conv15 model):

detections_count = 2025098, unique_truth_count = 119097
class_id = 0, name = object1, 20250 ap = 1.56 % class_id = 1, name = car, ap = 21.61 % class_id = 2, name = person, ap = 17.27 % class_id = 3, name = object2, ap = 1.05 % class_id = 4, name = object3, ap = 0.16 % class_id = 5, name = object4, ap = 24.39 % class_id = 6, name = speed, ap = 11.49 % class_id = 7, name = object5, ap = 0.52 % class_id = 8, name = object6, ap = 14.22 % class_id = 9, name = dog, ap = 11.77 % class_id = 10, name = cat, ap = 0.00 % class_id = 11, name = object7, ap = 0.00 % class_id = 12, name = object8, ap = 10.84 % for thresh = 0.25, precision = 0.69, recall = 0.05, F1-score = 0.09 for thresh = 0.25, TP = 5616, FP = 2521, FN = 113481, average IoU = 48.39 %

mean average precision (mAP) = 0.088371, or 8.84 % Total Detection Time: 1060.000000 Seconds

point cloud 233

Fine! I will wait! Now what can you advise? Maybe there is another way to get frames with drawn detection boxes? like a bytes example

Regards.

AlexeyAB commented 5 years ago

What command do you use to get this cloud of points? What anchors did you get? And how many images for each class do you have (from - to), and in total?

JonnySme commented 5 years ago

@AlexeyAB

Used command - ./darknet detector calc_anchors data/obj.data -num_of_clusters 9 -width 608 -height 608
I get next anchors: 23.8258,36.2225, 58.1867,109.3752, 89.1274,229.4225, 195.5299,142.3834, 126.7489,380.0141, 288.1019,325.9388, 521.4481,197.4485, 239.1332,519.3303, 504.4526,514.2428
from 2000 to 40000 images have. This is an approximate total figure, for each class, how many images exactly come out to say with accuracy I can not, unfortunately, I do not remember. Total 137256 images

Regards.

AlexeyAB commented 5 years ago

It looks like your dataset too complex, and there are very small and big objects.

So try to train yolov3.cfg with default anchors and width=832 height=832

Or try to train yolov3_5l.cfg with default anchors and width=608 height=608

Or just try to train yolov3-tiny_3l.cfg with default anchors or follow the rule: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

recalculate anchors for your dataset for width and height from cfg-file: darknet.exe detector calc_anchors data/obj.data -num_of_clusters 9 -width 416 -height 416 then set the same 9 anchors in each of 3 [yolo]-layers in your cfg-file. But you should change indexes of anchors masks= for each [yolo]-layer, so that 1st-[yolo]-layer has anchors larger than 60x60, 2nd larger than 30x30, 3rd remaining. If many of the calculated anchors do not fit under the appropriate layers - then just try using all the default anchors.

More: https://github.com/AlexeyAB/darknet/issues/2463

JonnySme commented 5 years ago

@AlexeyAB Thank you very much for answers! You right about dataset.

At the moment, with the anchors (23.8258,36.2225, 58.1867,109.3752, 89.1274,229.4225, 195.5299,142.3834, 126.7489,380.0141, 288.1019,325.9388, 521.4481,197.4485, 239.1332,519.3303, 504.4526,514.2428) that I showed you I put on training using the file yolov3-tiny_3l.cfg, there is still no way to check, I will write you how to see the results.

The MAP I showed you is the result of 37000 iterations using yolov3-tiny_3l.cfg with anchors by default

Sorry to ask again, could you please explain. It turns out you need to change the Mask in each of the 3 layers? If you look at the example of yolov3-tiny_3l.cfg. In my case, my first anchors look like "23.8258,36.2225, 58.1867,109.3752", 36.2225 and 109.3752 exceed 30x30 and 60x60, it turns out I need to change the Mask parameters? If so, how to choose them correctly for 3 layers? For me, tiny is preferable, since only with tiny it turns out stable 10-15 frames to get on tx2.

Regards.

AlexeyAB commented 5 years ago

@JonnySme

Total 137256 images

So you should train more than 130 000 iterations.

Try to use

[net]
width=608
height=608

learning_rate=0.001
burn_in=1000
max_batches=140000
steps=100000, 120000
scales=0.1, 0.1
... the remaining by default

[yolo]
anchors = 23,36, 58,109,    89,229, 195,142, 126,380, 288,325,  521,197, 239,519, 504,514
mask = 2,3,4,5,6,7,8
num=9
...

[yolo]
anchors = 23,36, 58,109,    89,229, 195,142, 126,380, 288,325,  521,197, 239,519, 504,514
mask = 1
num=9
...

[yolo]
anchors = 23,36, 58,109,    89,229, 195,142, 126,380, 288,325,  521,197, 239,519, 504,514
mask = 0
num=9

JonnySme commented 5 years ago

@AlexeyAB Hello!

About training. I tried to start training with the settings you suggested, including the Mask parameter I tried using the configuration you suggested above. when I start training, an error occurs: filters= in the [convolutional]-layer doesn't correspond to classes= or mask= in [yolo]-layer

i ise "yolov3-tiny_3l.cfg", with conv15 model. But, if I leave the Mask parameter as default, then there is no error and the training starts without errors. Are there any suggestions for this error? below is the configuration with which the training runs without errors

[yolo] mask = 6,7,8 anchors = 23,36, 58,109, 89,229, 195,142, 126,380, 288,325, 521,197, 239,519, 504,514 num=9 ...

[yolo] mask = 3,4,5 anchors = 23,36, 58,109, 89,229, 195,142, 126,380, 288,325, 521,197, 239,519, 504,514 num=9 ...

[yolo] mask = 0, 1, 2 anchors = 23,36, 58,109, 89,229, 195,142, 126,380, 288,325, 521,197, 239,519, 504,514 num=9

I would like to ask about the code, could you explain a few things please. Now I would like to know from you, were previously in the file image.c function "ipl_into_image" and "fill_image_from_stream", now what these functions are replaced? Particularly interested in the function "ipl_into_image" (file "image.c"). I noticed that some functions differ from the functions that are in the original repository. Can you please tell about the function "detect_in_thread" (file "demo.c"). Example, code in pjreddie repo "float X = buff_letter[(buff_index+2)%3].data;", in your repo "float X = det_s.data;", What's the Difference? You have "buff_letter" changed to "dets.data"

Regards.

AlexeyAB commented 5 years ago

@JonnySme

How many classes do you have?
This is the same, but implemented in a different way.

JonnySme commented 5 years ago

@AlexeyAB

I have 13 classes. filters i set 54. Now 14000 iterations have now passed, running using anchors "23.36, 58.109, 89.229, 195.142, 126.380, 288.325, 521.197, 239.519, 504.514," Mask left by default. In the previous commit wrote about it. Now avg is "2.0", batch = 64, subvisions = 4
Do your implementations of these functions speed up work or do they not affect performance?

JonnySme commented 5 years ago

Hey. I'm afraid you missed or forgot about my comments @AlexeyAB , really looking forward to your reply. Currently, 44,800 iterations have already passed, the "Map" and "avg" parameters no longer changed after 14,000 iterations, the same numbers

AlexeyAB commented 5 years ago

@JonnySme Hi,

Can you rename your cfg-file to txt-file and drag-n-drop it to your message?

JonnySme commented 5 years ago

@AlexeyAB Thanks for your answer! This is my

AlexeyAB commented 5 years ago

@JonnySme In your case, may be better to train with default anchors

AlexeyAB commented 5 years ago

@JonnySme

I don't know why you get memory leaks.

Just check these lines: https://github.com/AlexeyAB/darknet/blob/b751bac17505a742f149ada81d75689b5e692cde/src/demo.c#L71-L73

https://github.com/AlexeyAB/darknet/blob/b751bac17505a742f149ada81d75689b5e692cde/src/demo.c#L293

https://github.com/AlexeyAB/darknet/blob/b751bac17505a742f149ada81d75689b5e692cde/src/demo.c#L95

JonnySme commented 5 years ago

@AlexeyAB About memory leak, it is my fault. I forgot uncomment my image release. Sorry,

Why do you need this line? "in_s = get_image_from_stream_letterbox (cap, net.w, net.h, net.c, & in_img, cpp_video_capture, dont_close_stream);" As I understand it is not used, through this function frames are taken "get_image_from_stream_resize".

Thanks.

AlexeyAB / darknet

Training configuration and get drawing images #2455