Open aditbhrgv opened 5 years ago
@aditbhrgv Hi,
@AlexeyAB Thanks for your reply !
Attached is .cfg file !
[net]
batch=64 subdivisions=32 width=608 height=608 channels=3 momentum=0.9 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1
learning_rate=0.0005 burn_in=2000 max_batches = 35000 policy=steps steps=360000,380000 scales=.1,.1
[convolutional] batch_normalize=1 filters=16 size=3 stride=1 pad=1 activation=leaky
[maxpool] size=2 stride=2
[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=leaky
[maxpool] size=2 stride=2
[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky
[maxpool] size=2 stride=2
[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky
[maxpool] size=2 stride=2
[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky
[maxpool] size=2 stride=2
[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky
[maxpool] size=2 stride=1
[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky
###########
[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky
[convolutional] size=1 stride=1 pad=1 filters=21 activation=linear
[yolo] mask = 6,7,8 anchors = 8, 10, 11, 12, 14, 11, 18, 14, 25, 15, 36, 18, 49, 23, 71, 25, 93, 42 classes=2 num=9 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1
[route] layers = -4
[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky
[upsample] stride=2
[route] layers = -1, 8
[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky
[convolutional] size=1 stride=1 pad=1 filters=21 activation=linear
[yolo] mask = 3,4,5 anchors = 8, 10, 11, 12, 14, 11, 18, 14, 25, 15, 36, 18, 49, 23, 71, 25, 93, 42
classes=2 num=9 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1
[route] layers = -3
[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky
[upsample] stride=2
[route] layers = -1, 6
[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky
[convolutional] size=1 stride=1 pad=1 filters=21 activation=linear
[yolo] mask = 0,1,2 anchors = 8, 10, 11, 12, 14, 11, 18, 14, 25, 15, 36, 18, 49, 23, 71, 25, 93, 42 classes=2 num=9 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1
The training and validation datasets are separate. There are no intersections between them
Did you divide it uniform randomly or not?
Did you check your dataset by using Yolo_mark?
Can you show cloud.png
image after this command?
./darknet detector calc_anchors data/obj.data -num_of_clusters 9 -width 608 -height 608 -show
No I checked using Yolo mark, it shows correct BB on images. Attached is the cloud.png
@aditbhrgv
Try to train by using these mask and filters from the begining
filters=7
[yolo]
mask = 8
anchors = 8, 10, 11, 12, 14, 11, 18, 14, 25, 15, 36, 18, 49, 23, 71, 25, 93, 42
.....
filters=14
[yolo]
mask = 6,7
anchors = 8, 10, 11, 12, 14, 11, 18, 14, 25, 15, 36, 18, 49, 23, 71, 25, 93, 42
...
filters=42
[yolo]
mask = 0,1,2,3,4,5
anchors = 8, 10, 11, 12, 14, 11, 18, 14, 25, 15, 36, 18, 49, 23, 71, 25, 93, 42
THanks ! I'll try that.. Can you please tell me the reasoning behind doing this ? WOuld be really helpful! Thanks
After training - show your Loss & mAP chart
https://github.com/AlexeyAB/darknet#how-to-improve-object-detection
recalculate anchors for your dataset for width and height from cfg-file: darknet.exe detector calc_anchors data/obj.data -num_of_clusters 9 -width 416 -height 416 then set the same 9 anchors in each of 3 [yolo]-layers in your cfg-file. But you should change indexes of anchors masks= for each [yolo]-layer, so that 1st-[yolo]-layer has anchors larger than 60x60, 2nd larger than 30x30, 3rd remaining. Also you should change the filters=(classes + 5)*
before each [yolo]-layer. If many of the calculated anchors do not fit under the appropriate layers - then just try using all the default anchors.
@AlexeyAB Can you please let me know the possible reasons for this fluctuating mAP?
I have currently set random=0 in .cfg file and started training. This lead to less fluctuating behavior(than previous attached graph).
I have started training with changed anchors you decribed before and share the results once its done.
Also, could you please give me the a bit more interpretation of cloud.png ?
And , I tried to train same dataset on Pytorch implemetation and my mAP got converged after 23 epochs. My inital LR was 0.01 and decreased by 10 after 20,50,100 epochs.
Can I set the same LR schedule iin .cfg file ?
Thanks
Can you please let me know the possible reasons for this fluctuating mAP?
There can be many reasons.
And , I tried to train same dataset on Pytorch implemetation and my mAP got converged after 23 epochs. My inital LR was 0.01 and decreased by 10 after 20,50,100 epochs. Can I set the same LR schedule iin .cfg file ?
If you have 5400 training images and set batch=64, then epoch = 5400/64 = 84 iterations So 20 epochs = 1680 iterations 50 epochs = 4200 iterations 100 epochs = 8400 iterations
Set
steps=1680, 4200, 8400
scales=0.1, 0.1, 0.1
and learning_rate=0.01
instead of https://github.com/AlexeyAB/darknet/blob/099b71d1de6b992ce8f9d7ff585c84efd0d4bf94/cfg/yolov3.cfg#L18
Hi @AlexeyAB , I got the below result after following the above LR schedule.
learning_rate=0.01 steps=1680, 4200, 8400 scales=0.1, 0.1, 0.1
But , I trained this without random option in .cfg file. I can try to train with random option in .cfg file again and obtain the results again. Looking at the mAP graph, I think I reduced the LR too quickly as it converged to 75% mAP finallly which could be better around 82% (as seen from the graph.) I will try to set "scales=0.05, 0.05, 0.05" in .cfg file again and see the results. Do you have any other suggestions?
Also, can I generate a video of the predictions on the validation set using my trained model ? I can use "./build/darknet detector test" option to see the visualizations but it gives one image at a time. I want to give whole validation set and save the output.
Also, can I generate a video of the predictions on the validation set using my trained model ? I can use "./build/darknet detector test" option to see the visualizations but it gives one image at a time. I want to give whole validation set and save the output.
Are your validation images - frames from video? Just run detection on this video.
Also you can downlod http://mplayerwin.sourceforge.net/downloads.html and run this command in the folder where are only Validation images
mencoder mf://*.jpg -mf w=1280:h=720:fps=15:type=jpg -ovc lavc -lavcopts vcodec=mpeg4:vbitrate=4000:mbd=2:trell -oac copy -o conveyor_valid.avi
so videofile conveyor_valid.avi will be generated
Then run:
./darknet detector demo data/conveyor.data yolov3-tiny_occlusion_track.cfg backup/yolov3-tiny_occlusion_track_last.weights conveyor_valid.avi -out_filename out_conveyor_valid.avi
Also you can try
./darknet detector test data/conveyor.data yolov3-tiny_occlusion_track.cfg backup/yolov3-tiny_occlusion_track_last.weights < data/conveyor_valid.txt
Are your validation images - frames from video?
No, they are .jpg files located in a folder.
Also you can downlod http://mplayerwin.sourceforge.net/downloads.html and run this command in the folder where are only Validation images
Is there same tool for Ubuntu ?
./darknet detector demo data/conveyor.data yolov3-tiny_occlusion_track.cfg backup/yolov3-tiny_occlusion_track_last.weights conveyor_valid.avi -out_filename out_conveyor_valid.avi
@AlexeyAB I used this command to draw the BB on the .avi but I see a bit of offset on the detected objects. What can be a problem?
May be wrong annotations, check dataset by using https://github.com/AlexeyAB/Yolo_mark
annotations
I tested on single image and the BB is perfectly overlaid on the image using "/darknet detector test" command. It seems it's only a problem when I give input .avi video. I see the offsets for the objects when they are relatively closer and not when they are at a some distance away. Maybe, I can try ./darknet detector test data/conveyor.data yolov3-tiny_occlusion_track.cfg backup/yolov3-tiny_occlusion_track_last.weights < data/conveyor_valid.txt instead of
./darknet detector demo data/conveyor.data yolov3-tiny_occlusion_track.cfg backup/yolov3-tiny_occlusion_track_last.weights conveyor_valid.avi -out_filename out_conveyor_valid.avi
@AlexeyAB How can I reduce the fps of the output video generated ? It's too fast as of now.
Change 1st line and comment 2nd: https://github.com/AlexeyAB/darknet/blob/099b71d1de6b992ce8f9d7ff585c84efd0d4bf94/src/demo.c#L186-L187
@AlexeyAB Now, I get the new mAP which converged around 81%. Precision = 84%, REcall = 71% F1 = 77%. . However, these results I got without using "random" flag. I think results can be better with multi-scale option.
Yes, try to train with random=1
@AlexeyAB I tried with random=1 option, but mAP, precision, recall and F1 reduced instead of increasing. Could you please suggest something? Thanks
@AlexeyAB I have a new dataset for which cloud.png is shown. How can I set mask for the anchors according to this distribution? Is there any link where I can better understand cloud.png interpretation?
Hi @aditbhrgv - I found this explanation helpful for determining custom anchors.
HI @DarylWM Thank you ! Can you please explain the significance of cloud.png.? I can see the anchors and the training data points distributed along them. Is my understanding correct ? If yes, how does the training samples lying outside these anchors will be detected ? Thanks again !
Hi @aditbhrgv - I found this explanation helpful for determining custom anchors.
Hi all,
I am trying to use yolov3-tiny_3l.cfg for my custom dataset with 2 classes. I changed my .cfg for classes, filters and also obj.data files. I generated anchors for my custom dataset and put it into .cfg file.
no_train_images = 5400
no_test_images = 1200
I can see the loss going down, but the mAP fluctuates very much.(see graph with mAP) How can I solve this problem?Any suggestions..? THanks