Open astronaut71 opened 5 years ago
@astronaut71 Hi,
I would recommend you to train yolov3-spp.cfg
model with batch=64 subdivisions=64 width=832 height=832 random=1
if you have enough GPU-RAM and anchors calculated for 832x832
./darknet detector calc_anchors /home/darknet/build/darknet/x64/data/workers.data -num_of_clusters 9 -width 832 -height 832 -show
or if you don't have enough GPU-RAM or want to train faster: train yolov3-spp.cfg
model with batch=64 subdivisions=64 width=640 height=640 random=1
and anchors calculated for 640x640
./darknet detector calc_anchors /home/darknet/build/darknet/x64/data/workers.data -num_of_clusters 9 -width 640 -height 640 -show
or if you want to use faster model (but less accuracy), train yolov3-tiny-3l.cfg
model with batch=64 subdivisions=32 width=832 height=832 random=1
and anchors calculated for 832x832
I have RTX 2080 and installed CUDA 10. should I use max=200 parameter or? So other parameters in yolov3-spp.cfg should remain same?
can I also continue with the training using the last weights when its stopped and killed by segmentation fault (core damped) ?
@astronaut71
can I also continue with the training using the last weights when its stopped and killed by segmentation fault (core damped) ?
Yes, you can.
I have RTX 2080 and installed CUDA 10. should I use max=200 parameter or? So other parameters in yolov3-spp.cfg should remain same?
Yes, you can use max=200
especially in the last [yolo]
layer.
All other parameters you can keep the same.
ok. I trained yolov3-spp.cfg
model with batch=64
subdivisions=64
width=832
height=832
random=1
. It stopped after 7500 iteration as can see the in chart
So should I continue training with the last weight before it stopped? Or train from beginning faster with 640x640?
Another thing is the avoiding overfitting. How can do that? So far I can reach max 90% maP and average loss not less then 0.9. Can I pre-trained the model and used some per-trained weights? Means before start training with the command
./darknet detector train /home/admini/darknet/data/workers.data /home/admini/darknet/cfg/yolov3-spp-832.cfg /home/admini/darknet/darknet53.conv.74 -map
can use some pre-trained weighs to avoid over fitting or how? What should I do before running ./darknet detector train /home/admini/darknet/data/workers.data /home/admini/darknet/cfg/yolov3-spp-832.cfg /home/admini/darknet/darknet53.conv.74 -map
?
@astronaut71
ok. I trained yolov3-spp.cfg model with batch=64 subdivisions=64 width=832 height=832 random=1. It stopped after 7500 iteration as can see the in chart
Did you stop it manually? Or is it stopped automatically with some error?
Or train from beginning faster with 640x640?
Train it from the begining.
You can use pre-trained weights file darknet53.conv.74
for any your training process.
To check is there overfitting just set valid=valid.txt
in your obj.data
file so mAP calculation during training will be calculated for valid dataset.
Then just get the weights file with the max of mAP by using ./darknet detector map
... command.
it stopped automatically , was killed. So what should I do? Can continue the training with the last weight or start training from the beginning with 640x640 configuration file?
Yes, when started training i was using pre-trained weights file with the comand ./darknet detector train /home/admini/darknet/data/workers.data /home/admini/darknet/cfg/yolov3-spp-832.cfg /home/admini/darknet/darknet53.conv.74 -map
it stopped automatically , was killed.
Just try to update your code from github and recompile.
So what should I do? Can continue the training with the last weight or start training from the beginning with 640x640 configuration file?
Just continue training with width=832 height=832 random=1 and the last.weights file:
/darknet detector train /home/admini/darknet/data/workers.data /home/admini/darknet/cfg/yolov3-spp-832.cfg /home/admini/darknet/backup/yolov3-spp-832_last.weights -map
Just try to update your code from github and recompile. What do you mean with this?
Download the latest version of Darknet with fixed bugs.
ah ok
is it valid.txt
file in valid=valid.txt
the test file or the train one? cause in obj.data
file I have set like this
classes = 1
train = /home/admini/darknet/data/train.txt
valid = /home/admini/darknet/data/test.txt
names = /home/admini/darknet/data/workers.names
backup = /home/admini/darknet/backup
I my case valid.txt
file is the train
file. And I have test
file too. Is it ok?
@astronaut71 Yes, it is OK. So training will use train.txt, and checking mAp (during training) will use test.txt
ok. great. Im still training and at steps 15620 can reach mAp of 95% but avg. loss is still not going down than 0.78. But the recall by mAp of 95% is 0.89. Which one is most relevant parameter for the accuracy? And is the training going good? Should keep doing it till let we say 30 000iterations?
sorry actually the obj.data
file is
classes = 1
train = /home/admini/darknet/darknet/data/train.txt
test = /home/admini/darknet/darknet/data/test.txt
names = /home/admini/darknet/darknet/data/workers.names
backup = /home/admini/darknet/darknet/backup
is it ok test = /home/admini/darknet/darknet/data/test.txt
or must bevalid = /home/admini/darknet/darknet/data/test.txt
?
must be valid = /home/admini/darknet/darknet/data/test.txt
ok. great. Im still training and at steps 15620 can reach mAp of 95% but avg. loss is still not going down than 0.78. But the recall by mAp of 95% is 0.89. Which one is most relevant parameter for the accuracy?
mAP
And is the training going good?
Yes.
Should keep doing it till let we say 30 000iterations?
If mAP increases - then yes.
But those results were with test = /home/admini/darknet/darknet/data/test.txt
. So means need to start train from the beginning with ./darknet detector train /home/admini/darknet/darknet/data/workers.data /home/admini/darknet/darknet/cfg/yolov3-spp-832.cfg /home/admini/darknet/darknet/darknet53.conv.74 -map
where in obj.data
valid = /home/admini/darknet/darknet/data/test.txt
. Correct? or can just continue training with last weights and just change the obj.data
?
Just continue training from the last point.
valid=
affects only on mAP calculation.
yes. When continue training or even start from the beginning with the correct valid = /home/admini/darknet/darknet/data/test.txt
then the mAP dropped to 56%. When start from the beginning by step 6500 the mAP is 56% and average loss is 1. So how to get mAP higher again? is it so low cause have only 120 test images?
or just continue training and that 56% of mAP will increase as it only by 6500 steps?
@astronaut71 Hi,
- I would recommend you to train
yolov3-spp.cfg
model withbatch=64 subdivisions=64 width=832 height=832 random=1
if you have enough GPU-RAM and anchors calculated for 832x832./darknet detector calc_anchors /home/darknet/build/darknet/x64/data/workers.data -num_of_clusters 9 -width 832 -height 832 -show
- or if you don't have enough GPU-RAM or want to train faster: train
yolov3-spp.cfg
model withbatch=64 subdivisions=64 width=640 height=640 random=1
and anchors calculated for 640x640./darknet detector calc_anchors /home/darknet/build/darknet/x64/data/workers.data -num_of_clusters 9 -width 640 -height 640 -show
- or if you want to use faster model (but less accuracy), train
yolov3-tiny-3l.cfg
model withbatch=64 subdivisions=32 width=832 height=832 random=1
and anchors calculated for 832x832
@AlexeyAB Just for my understanding: Why is the yolov3_5l model not considered in this case ?
Thanks for your help !!
Hi
Im trying to detect multiple small and large objects (workers on the construction field can have small size if image was taken from far distance or large if it was closed). I include two sample images just to show you the size of the objects trying to detect and how it looks like in the image.
Here is my data Set:
So I have some questions : 1- Which cfg file you recommend to use for training? 2- What is the recommended width and height to use for training and detection? 3- How to calculate correct anchors for my data set? 4- What is the num_of_clusters , final_width and final_height and how to calculate it for my data set?
Here the detected clusters when running
./darknet detector calc_anchors /home/darknet/build/darknet/x64/data/workers.data -num_of_clusters 9 -width 640 -height 640 -show
Any help?