Detection of small and large objects with different image resolutions and distances

astronaut71 commented 5 years ago

Hi

Im trying to detect multiple small and large objects (workers on the construction field can have small size if image was taken from far distance or large if it was closed). I include two sample images just to show you the size of the objects trying to detect and how it looks like in the image.

canberra_street-east-delta-29-oct-18_00037_workers_hoisting_crew_image050

Here is my data Set:

1 class
923 images for training and 116 images for validation
Images resolution are different as were taken from different cameras such as: 2112 x 1584, 1920x1080, 3264x1836
So workers are dynamic objects that can change the locations/positions (as they can stand/seat and work or can move)
Training images are not same , changes in objects dimensions depending from which distance were taken.
Detection images are almost the same as training images.

So I have some questions : 1- Which cfg file you recommend to use for training? 2- What is the recommended width and height to use for training and detection? 3- How to calculate correct anchors for my data set? 4- What is the num_of_clusters , final_width and final_height and how to calculate it for my data set?

Here the detected clusters when running ./darknet detector calc_anchors /home/darknet/build/darknet/x64/data/workers.data -num_of_clusters 9 -width 640 -height 640 -show

num_of_clusters = 9, width = 640, height = 640 read labels from 921 images loaded image: 921 box: 5858 all loaded.

calculating k-means++ ...

iterations = 63

avg IoU = 75.19 %

Saving anchors to the file: anchors.txt anchors = 8, 25, 16, 53, 21, 88, 31, 66, 31,125, 50,106, 47,199, 86,160, 72,294

clusters_screenshot_20 03 2019

Any help?

AlexeyAB commented 5 years ago

@astronaut71 Hi,

I would recommend you to train yolov3-spp.cfg model with batch=64 subdivisions=64 width=832 height=832 random=1 if you have enough GPU-RAM and anchors calculated for 832x832 ./darknet detector calc_anchors /home/darknet/build/darknet/x64/data/workers.data -num_of_clusters 9 -width 832 -height 832 -show
or if you don't have enough GPU-RAM or want to train faster: train yolov3-spp.cfg model with batch=64 subdivisions=64 width=640 height=640 random=1 and anchors calculated for 640x640 ./darknet detector calc_anchors /home/darknet/build/darknet/x64/data/workers.data -num_of_clusters 9 -width 640 -height 640 -show
or if you want to use faster model (but less accuracy), train yolov3-tiny-3l.cfg model with batch=64 subdivisions=32 width=832 height=832 random=1 and anchors calculated for 832x832

astronaut71 commented 5 years ago

I have RTX 2080 and installed CUDA 10. should I use max=200 parameter or? So other parameters in yolov3-spp.cfg should remain same?

astronaut71 commented 5 years ago

can I also continue with the training using the last weights when its stopped and killed by segmentation fault (core damped) ?

AlexeyAB commented 5 years ago

@astronaut71

can I also continue with the training using the last weights when its stopped and killed by segmentation fault (core damped) ?

Yes, you can.

I have RTX 2080 and installed CUDA 10. should I use max=200 parameter or? So other parameters in yolov3-spp.cfg should remain same?

Yes, you can use max=200 especially in the last [yolo] layer.

All other parameters you can keep the same.

astronaut71 commented 5 years ago

ok. I trained yolov3-spp.cfg model with batch=64 subdivisions=64 width=832 height=832 random=1. It stopped after 7500 iteration as can see the in chart chartyolov3-spp-832

So should I continue training with the last weight before it stopped? Or train from beginning faster with 640x640?
Another thing is the avoiding overfitting. How can do that? So far I can reach max 90% maP and average loss not less then 0.9. Can I pre-trained the model and used some per-trained weights? Means before start training with the command

./darknet detector train /home/admini/darknet/data/workers.data /home/admini/darknet/cfg/yolov3-spp-832.cfg /home/admini/darknet/darknet53.conv.74 -map can use some pre-trained weighs to avoid over fitting or how? What should I do before running ./darknet detector train /home/admini/darknet/data/workers.data /home/admini/darknet/cfg/yolov3-spp-832.cfg /home/admini/darknet/darknet53.conv.74 -map ?

AlexeyAB commented 5 years ago

@astronaut71

ok. I trained yolov3-spp.cfg model with batch=64 subdivisions=64 width=832 height=832 random=1. It stopped after 7500 iteration as can see the in chart

Did you stop it manually? Or is it stopped automatically with some error?

Or train from beginning faster with 640x640?

Train it from the begining.

You can use pre-trained weights file darknet53.conv.74 for any your training process.

To check is there overfitting just set valid=valid.txt in your obj.data file so mAP calculation during training will be calculated for valid dataset. Then just get the weights file with the max of mAP by using ./darknet detector map... command.

astronaut71 commented 5 years ago

it stopped automatically , was killed. So what should I do? Can continue the training with the last weight or start training from the beginning with 640x640 configuration file?

Yes, when started training i was using pre-trained weights file with the comand ./darknet detector train /home/admini/darknet/data/workers.data /home/admini/darknet/cfg/yolov3-spp-832.cfg /home/admini/darknet/darknet53.conv.74 -map

AlexeyAB commented 5 years ago

it stopped automatically , was killed.

Just try to update your code from github and recompile.

So what should I do? Can continue the training with the last weight or start training from the beginning with 640x640 configuration file?

Just continue training with width=832 height=832 random=1 and the last.weights file: /darknet detector train /home/admini/darknet/data/workers.data /home/admini/darknet/cfg/yolov3-spp-832.cfg /home/admini/darknet/backup/yolov3-spp-832_last.weights -map

astronaut71 commented 5 years ago

Just try to update your code from github and recompile. What do you mean with this?

AlexeyAB commented 5 years ago

Download the latest version of Darknet with fixed bugs.

astronaut71 commented 5 years ago

ah ok

astronaut71 commented 5 years ago

is it valid.txt file in valid=valid.txt the test file or the train one? cause in obj.data file I have set like this

classes = 1
train  = /home/admini/darknet/data/train.txt
valid  = /home/admini/darknet/data/test.txt
names = /home/admini/darknet/data/workers.names
backup = /home/admini/darknet/backup

I my case valid.txt file is the train file. And I have test file too. Is it ok?

AlexeyAB commented 5 years ago

@astronaut71 Yes, it is OK. So training will use train.txt, and checking mAp (during training) will use test.txt

astronaut71 commented 5 years ago

ok. great. Im still training and at steps 15620 can reach mAp of 95% but avg. loss is still not going down than 0.78. But the recall by mAp of 95% is 0.89. Which one is most relevant parameter for the accuracy? And is the training going good? Should keep doing it till let we say 30 000iterations?

astronaut71 commented 5 years ago

sorry actually the obj.data file is

classes = 1
train  = /home/admini/darknet/darknet/data/train.txt
test  = /home/admini/darknet/darknet/data/test.txt
names = /home/admini/darknet/darknet/data/workers.names
backup = /home/admini/darknet/darknet/backup

is it ok test = /home/admini/darknet/darknet/data/test.txt or must bevalid = /home/admini/darknet/darknet/data/test.txt?

AlexeyAB commented 5 years ago

must be valid = /home/admini/darknet/darknet/data/test.txt

AlexeyAB commented 5 years ago

ok. great. Im still training and at steps 15620 can reach mAp of 95% but avg. loss is still not going down than 0.78. But the recall by mAp of 95% is 0.89. Which one is most relevant parameter for the accuracy?

mAP

And is the training going good?

Yes.

Should keep doing it till let we say 30 000iterations?

If mAP increases - then yes.

astronaut71 commented 5 years ago

But those results were with test = /home/admini/darknet/darknet/data/test.txt. So means need to start train from the beginning with ./darknet detector train /home/admini/darknet/darknet/data/workers.data /home/admini/darknet/darknet/cfg/yolov3-spp-832.cfg /home/admini/darknet/darknet/darknet53.conv.74 -map

where in obj.data valid = /home/admini/darknet/darknet/data/test.txt. Correct? or can just continue training with last weights and just change the obj.data?

AlexeyAB commented 5 years ago

Just continue training from the last point. valid= affects only on mAP calculation.

astronaut71 commented 5 years ago

yes. When continue training or even start from the beginning with the correct valid = /home/admini/darknet/darknet/data/test.txt then the mAP dropped to 56%. When start from the beginning by step 6500 the mAP is 56% and average loss is 1. So how to get mAP higher again? is it so low cause have only 120 test images?

astronaut71 commented 5 years ago

or just continue training and that 56% of mAP will increase as it only by 6500 steps?

alexanderfrey commented 5 years ago

@astronaut71 Hi,

I would recommend you to train yolov3-spp.cfg model with batch=64 subdivisions=64 width=832 height=832 random=1 if you have enough GPU-RAM and anchors calculated for 832x832 ./darknet detector calc_anchors /home/darknet/build/darknet/x64/data/workers.data -num_of_clusters 9 -width 832 -height 832 -show

or if you don't have enough GPU-RAM or want to train faster: train yolov3-spp.cfg model with batch=64 subdivisions=64 width=640 height=640 random=1 and anchors calculated for 640x640 ./darknet detector calc_anchors /home/darknet/build/darknet/x64/data/workers.data -num_of_clusters 9 -width 640 -height 640 -show

or if you want to use faster model (but less accuracy), train yolov3-tiny-3l.cfg model with batch=64 subdivisions=32 width=832 height=832 random=1 and anchors calculated for 832x832

@AlexeyAB Just for my understanding: Why is the yolov3_5l model not considered in this case ?

Thanks for your help !!

AlexeyAB / darknet

Detection of small and large objects with different image resolutions and distances #2659