Open chungkaihsieh opened 5 years ago
Did you tried the yolov3-tiny_3l? Its a new config that was recently added. It has 1 more YOLO-Layer. Should be better in performing but a lil bit slower.
@chungkaihsieh Hi,
Try to use: https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov3-tiny_3l.cfg
with recalculated anchors:
./darknet detector calc_anchors data/obj.data -num_of_clusters 9 -width 608 -height 608
Thank you, @AlexeyAB @Deadmin1!
I have tried this new yolov3-tiny_3l.cfg yesterday. Good news is that some big size bounding boxes emerge comparing to 2 layers. However, the result is not good enough even I take training images for testing. e.g. There is 5 people (upper body) selfie, only 2 out of 5 been detected.
Below is setting and status:
Pros & Cons of current weights:
I would like to enhance detection on parts of people with only upper body in images. May you kindly give some tips for detection? Thanks a lot for your help : )
@chungkaihsieh
yolov3-tiny.cfg
and change these lines: https://github.com/AlexeyAB/darknet/blob/fd0df9297c86a272f0bf0841291bc4565e90a7cd/cfg/yolov3-tiny.cfg#L107-L121to these lines - and train from the beginin:
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky
yolov3-tiny_3l.cfg
and change these lines: https://github.com/AlexeyAB/darknet/blob/fd0df9297c86a272f0bf0841291bc4565e90a7cd/cfg/yolov3-tiny_3l.cfg#L108-L122to these lines - and train from the beginin:
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky
Hi AlexeyAB, Thanks again for your help. I will check dataset and try these configurations. Also, update the results for further discussion. :目
@AlexeyAB Hi,
Sorry for the late update and thanks again for your kind help. I would like to show you some results and ask some questions. Good news is that after follow your suggestions: training from beginin with more conv layers can detect varied-size people.💯But there is some drawback comparing to pre-train weights. (2 layers feature map)
Thanks a lot for your time and kindness. CK Hsieh
@chungkaihsieh Hi,
As I understand, the is the best cfg-file for you, but you want to reduce BFlops.
- YOLOv3-tiny-2layers (Without pre-train weights & added more convolutional filters) a. FLOPS 10.291 Bn b. AP = 42.21% with high FP. (50000 steps) c. can detect varied-size people but some large-size people still fail, detect chairs as people.
So use the same number of convolutional layers, but use 2x less filters:
Those, use yolov3-tiny.cfg
and change these lines: https://github.com/AlexeyAB/darknet/blob/fd0df9297c86a272f0bf0841291bc4565e90a7cd/cfg/yolov3-tiny.cfg#L107-L121
to these lines - and train from the beginin:
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
What ratio of the negative sample is reasonable for fine-tune and train from beginin, respectively. Does it matter? Since I found that if the ratio of negative samples is too large, the recall will decrease dramatically.
Usually 1:1. But it depends on what is more important for you:
if you want to decrease Flase-Positives - then use more negative-samples (images with backgrounds without objects)
if you want to decrease False-Negatives - then use more images with objects than images with backgrounds
In my case, how many steps would you recommend for fine-tune and train from begining?
You should train until increasing of mAP will stop: https://github.com/AlexeyAB/darknet#when-should-i-stop-training
@AlexeyAB Hi,
Thank you for your research and network improvement.
I work with your network yolov3-tiny_3l.cfg. Its amazing! However, the average loss at all 200,000 iterations was ~0.8 (the anchors were recalculated). For my 22 classes, most likely this is not enough iterations. Is it possible to improve network performance by changing the learning schedule? What values would you recommend?
Thanks so much.
@umbralada Hi,
I work with your network yolov3-tiny_3l.cfg. Its amazing! However, the average loss at all 200,000 iterations was ~0.8 (the anchors were recalculated). For my 22 classes, most likely this is not enough iterations. Is it possible to improve network performance by changing the learning schedule? What values would you recommend?
The more layers - the higher accuracy mAP but also the higher Loss. So doesn't worry about Loss, try to check mAP.
Thank you, @AlexeyAB
@AlexeyAB Hi,
Thanks for your help in advance. I attempt to detect the varied-size people(only people) by yolov3-tiny.cfg (608x608). The number of people per image perhaps range from 1 to 100. The images contain selfie (large-size people), crowd people (small-size people), and the selfie with the crowd people (both large-size and small-size people). I have followed instruction also recalculated anchors and found that the models can perform well on small-size people. However, the large-size people can't be detected.
I have tried some pre-train models to test on data and finally determine perhaps yolov3-tiny is what I want.
May you give me some pieces of advice to detect varied-size people.
Thanks for your time and consideration. CK Hsieh