Closed Kyuuki93 closed 3 years ago
@Kyuuki93
Also since we don't use Deformable-conv, then you can try to use RFB-block with flexible receptive field from 1x1 to 11x11: #4507 (comment)
By change
activation = linear
toactivation = leaky
?
You can use activation = linear
for conv-layers.
Generally by adding [maxpool] maxpool_depth=1
:
[route]
layers = -1,-5,-9,-12
[maxpool]
maxpool_depth=1
out_channels=64
size=1
stride=1
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=linear
[shortcut]
from=-16
activation=leaky
@Kyuuki93 @AlexeyAB Hi,
When can we use the ASFF? please share the final cfg. have you tried to compare ASFF and csresnext50-panet-spp?
@zpmmehrdad yolov3-spp + ASFF yolov3-spp-asff-it.cfg.txt yolov3-spp + ASFF +Dropblock + RFB(bn=0)yolov3-spp-asff-db-it-rfb.cfg.txt
you can set bn=1
in RFB blocks to get better results.
have you tried to compare ASFF and csresnext50-panet-spp?
Not yet, you can compare it on your data, and this ASFF based on darknet-53 but csresnext50-panet-spp based on resnext, maybe we should implement ASFF in resnext50 for a fair compassion
@Kyuuki93 Thanks for your reply,
I saw the table that you shared and in the table "spp,giou,it=0.213,asff(softmax),rfb(bn=0)" has a good result at AP@.75. I'm going to use for ~200 classes and some classes are almost the same and AP@.75 important for me. Do you think "spp,giou,it=0.213,asff(softmax),rfb(bn=0)" is a good option for me or not?
Thanks
I'm going to use for ~200 classes and some classes are almost the same and AP@.75 important for me. Do you think "spp,giou,it=0.213,asff(softmax),rfb(bn=0)" is a good option for me or not?
Try to compare spp,giou,it=0.213,asff(softmax),rfb(bn=0)
and spp,giou,it=0.213,asff(softmax),rfb(bn=1)
, this ASFF
module wasn't get enough test, I'm not sure it can imporove AP@.75 at every dataset, I used it on a one-class datatset, so if you got results please share with us
@Kyuuki93
Not yet, you can compare it on your data, and this ASFF based on darknet-53 but csresnext50-panet-spp based on resnext, maybe we should implement ASFF in resnext50 for a fair compassion
Look at this comparison: https://github.com/AlexeyAB/darknet/issues/4406#issuecomment-567919052
For GPUs without Tensor Cores ResNext50 was better than Darknet53, but for Volta/Turing (RTX) GPUs and newer, it seems that Darknet53 is better.
So may be we should use CSPDarkNet-53
backbone rather than CSPResNeXt-50
https://github.com/WongKinYiu/CrossStagePartialNetworks#big-models
Also may be 1 block of BiFPN (based on NORM_CHAN_SOFTMAX) can be better than ASFF
So may be we should use
CSPDarkNet-53
backbone rather thanCSPResNeXt-50
https://github.com/WongKinYiu/CrossStagePartialNetworks#big-models
It seems worth to try
- Also may be 1 block of BiFPN (based on NORM_CHAN_SOFTMAX) can be better than ASFF
I have a question about BiFPN which is with 3 yolo layers, BiFPN should just keep P3-5 and ignore P6-7?
Btw, I have made a Spinenet-49 with 3 yolo layers spinenet.cfg.txt, you can check it for sure or take a test.
Training from scratch is a little bit slow..
@AlexeyAB Also, take a look of this https://github.com/AlexeyAB/darknet/issues/3874#issuecomment-568696075, it's seem gaussian_yolo
hurt recall
heavily, and low iou_thresh
can significant improve it.
And next I want to find out the relation between precision ,reall
and ignore_thresh, truth_thresh
@Kyuuki93
[Gaussian_yolo] introduces bbox_confidence_score = (0 - 1)
, so confidence_score = class_conf * bbox_conf
will be lower than confidence_score = class_conf
in [yolo] - it decreases the number of bboxes with thresh > conf_thresh
- it increases Precision and decreases Recall for the same conf_threshold
iou_thresh=0.213
allows Yolo to use many not the most suitable anchors for one object - it increases the number of bboxes (but additional bboxes are not accurate) - it increases Recall and decreases Precision for the same conf_threshold
Model | AP@.5 | precision (th=0.85) | recall(th=0.85) | precision (th=0.7) | recall(th=0.7) |
---|---|---|---|---|---|
spp,mse | 89.50% | 0.98 | 0.20 | 0.97 | 0.36 |
spp,giou | 90.09% | 0.98 | 0.25 | 0.97 | 0.40 |
spp,ciou | 89.88% | 0.99 | 0.22 | 0.97 | 0.38 |
spp,giou,gs | 91.39% | 0.99 | 0.05 | 0.97 | 0.47 |
spp,giou,gs,it | 91.87% | 0.99 | 0.16 | 0.97 | 0.52 |
I have a question about BiFPN which is with 3 yolo layers, BiFPN should just keep P3-5 and ignore P6-7?
Yes. You can just get features from these 3 points (P3, P4, P5). And use NORM_CHAN_SOFTMAX
Or you can get features from earlier points (figure below)
And you can duplicate BiFPN-block many times (from 2 to 8 BiFPN blocks) - page 5, table 1: https://arxiv.org/pdf/1911.09070v1.pdf
Btw, I have made a Spinenet-49 with 3 yolo layers spinenet.cfg.txt, you can check it for sure or take a test.
I did not go into details how the Senet should look in details. Several questions:
Can you show a link to the code (Pytorch/TF/...) from which you copied the Senet?
masks= of [yolo] layers should be fixed for P3, P2, P4 sizes
Why did you remove top 2 blocks (P5, P6)?
- Can you show a link to the code (Pytorch/TF/...) from which you copied the Senet?
There no public spinenet in any framework yet, this cfg was based on
- masks= of [yolo] layers should be fixed for P3, P2, P4 sizes
I did not notice that, will change then
- Why did you remove top 2 blocks (P5, P6)?
Feature map in P5, P6 were too small I think
- 2 shortcut layers are pointed to the same layer-5
layer-9 was shortcut
in 2nd bottleneck bloc, with activation=leaky
layer-10 was input of 3rd bottleneck, with activation=linear
- Is it normal that some of your layers have 19 BFLOPS?
I checked the model, and compare the ratio here
I think this 19 BFLOPS
was right, and it's result of use residual block
instead of bottleneck block
, which is diamond block
in previous figure
2 shortcut layers are pointed to the same layer-5
layer-9 was shortcut in 2nd bottleneck bloc, with activation=leaky layer-10 was input of 3rd bottleneck, with activation=linear
What do you mean? Do you mean this is correct?
# # 9 b2
[shortcut]
from=-4
activation=leaky
# # 10 b3 3rd gray rectangle block
# # from b1,b2
[shortcut]
from=-5
activation=leaky
What do you mean? Do you mean this is correct?
# # 9 b2
[shortcut]
from=-4
activation=leaky
# # 10 b3 3rd gray rectangle block
# # from b1,b2
[shortcut]
from=-5
activation=leaky -> should be linear
My mistake, but two shortcut
from layer-5 is correct
@AlexeyAB this spinenet
which has one shortcut
layer set wrong activation function and use 3 yolo on P2, P2, P3, got 88.80% AP@.5
and 52.26% AP@.75
in previous one-class dataset, training from scratch with setting
width,height = 384,384
random=0
iou_loss=giou
iou_thresh=0.213
but 1.5x slower than yolov3-spp, I will take more test then
@Kyuuki93 Try to train both yolov3-spp and fixed-spinenet without pre-trained weights and with the same other settings.
Seems counters_per_class not work out data unbalance issue, compare with class weights in classification problem, classify loss were produced by class only, but in detection problem, loss were produced by class and location even objectness, and loss from loc and obj were not relevant to class label, so losses multiplier could not work this out, actually in my dataset, class of box got high accuracy, its seems model just can't find the class-0's object which lack data on training dataset
I added 2 fixes, so now counters_per_class
affects the objectness
and bbox
too.
https://github.com/AlexeyAB/darknet/commit/35a3870979e0d819208a3de7a24c39cc0539651d
https://github.com/AlexeyAB/darknet/commit/b8fe630119fea81200f6ca4641ce2514d893df04
For comparison of spinenet
(fixed, 5 yolo-layers) and yolov3-spp
(3 yolo-layers), training from scratch with same settings
width = 384
height = 384
batch = 96
subdivisions = 16
learning_rate = 0.00025
burn_in = 1000
max_batches = 30200
policy = steps
steps = 15000, 20000, 25000
scales = .1,.1,.1
...
random = 0
iou_loss = giou
iou_normalizer = 0.5
iou_thresh = 0.213
Network | AP@.5 | AP.75 | precision(.7) | recall(.7) | Inference time |
---|---|---|---|---|---|
spinenet49-5l | 90.46% | 53.80% | 0.93 | 0.71 | 32.17ms |
yolov3-spp | 89.98% | 54.47% | 0.96 | 0.53 | 11.77ms |
@AlexeyAB
There are any op like nn.Parameter()
in this repo for implementing this wi
in BiFPN
?
@Kyuuki93
There are any op like nn.Parameter() in this repo for implementing this wi in BiFPN?
What do you mean?
If you want Unbounded fusion
, then just use activation=linear
instead of activation=NORM_CHAN_SOFTMAX
@AlexeyAB
For example, wi
is a scalar,
P4_mid = Conv( ( w1*P4_in + w2* Resize(P5_in)) / ( w1+ w2) )
,
this wi
should trainable but not relevant with any feature map
In ASFF
, w
was calculated by feature map through a conv_layer
@Kyuuki93
In ASFF, w was calculated by feature map through a conv_layer
Do you mean that is not so in BiFPN? https://github.com/xuannianz/EfficientDet/blob/ccc795781fa173b32a6785765c8a7105ba702d0b/model.py
If you want w
constant during inference, then you can do something like this:
[route]
layers = P4
[convolutional]
batch_normalize=1
filters=256
groups=256
size=1
stride=1
pad=1
activation=linear
[route]
layers = P5
[convolutional]
batch_normalize=1
filters=256
groups=256
size=1
stride=1
pad=1
activation=linear
[shortcut]
from = -3
For comparison of spinenet(fixed, 5 yolo-layers) and yolov3-spp(3 yolo-layers), training from scratch with same settings
Also try to compare with spinenet(fixed, 3 yolo-layers) + spp
, where is added SPP-block to the P5 or P6 block: https://github.com/AlexeyAB/darknet/issues/4382#issuecomment-568950286
@AlexeyAB
Do you mean that is not so in BiFPN? https://github.com/xuannianz/EfficientDet/blob/ccc795781fa173b32a6785765c8a7105ba702d0b/model.py
def build_BiFPN()
here is not so, it without w
https://github.com/xuannianz/EfficientDet/blob/ccc795781fa173b32a6785765c8a7105ba702d0b/model.py#L40-L93
def build_wBiFPN()
here is BiFPN with w
https://github.com/xuannianz/EfficientDet/blob/ccc795781fa173b32a6785765c8a7105ba702d0b/model.py#L96-L149
w
was defined here, actually, we need a layer like this one
https://github.com/xuannianz/EfficientDet/blob/ccc795781fa173b32a6785765c8a7105ba702d0b/layers.py#L33-L60
Maybe add a weights
to [shortcut]
layer is a option, also [shortcut]
can take more than 2 inputs, something like
[shortcut]
from=P4, P5_up
weights_type = feature (or channel or pixel)
weights_normalizion = relu (or softmax or linear)
activation = linear
Feature map on P6 only 4x4
, could be too small to get useful feature?
Normally, SPP
was on the middle and connected Backbone
and FPN
? Look like Backbone -> SPP -> FPN
But in Spinenet49
, it seems all network is a FPN
@AlexeyAB I moved spinenet
related comment to its issue
@Kyuuki93
Feature map on P6 only 4x4, could be too small to get useful feature?
Yes, then spp should be placed in P5 (especially if you use small initiall network resolution)
[shortcut] from=P4, P5_up weights_type = feature (or channel or pixel) weights_normalizion = relu (or softmax or linear) activation = linear
Yes, or maybe just enough feature
without channel or pixel
Interestingly, a fusion from BiPPN is more effective than such a fusion?
w - a vector (per-channel)
in BiFPN with ReLUbatch_normalize=1
- will do normalization to solve training instability issue[route]
layers= L1, L2, L3 # output: W x H x 3*C
[convolutional]
batch_normalize=1
filters=3*C
groups=3*C
size=1
stride=1
pad=1
activation=leaky
[local_avgpool]
avgpool_depth = 1 # isn't implemented yet
#avg across C instead of WxH - Same meaning as maxpool_depth=1 in [maxpool]
out_channels = C
@Kyuuki93
It seems that higher ignore_thresh=0.85 is better than ignore_thresh=0.7 for your dataset. https://github.com/AlexeyAB/darknet/issues/3874#issuecomment-568696075
Also turth_tresh=1.0 is good.
So for your dataset is better to use iou_tresh=1.0
(or not use it at all).
@AlexeyAB
It seems that higher ignore_thresh=0.85 is better than ignore_thresh=0.7 for your dataset.
ignore_thresh = 0.85
got higher AP@.5 but much lower recall than ignore_thresh = 0.7
Also turth_tresh=1.0 is good.
Actually,
truth_tresh
only worked on 1.0
truth_thresh
and ignore_thresh
set to 0.7
network become untrainableignore_thresh = 0.7
, truth_thresh
= 0.85`, decrease perfomanceSo for your dataset is better to use
iou_tresh=1.0
(or not use it at all).
What do you mean? For now, all training with iou_thresh = 0.213
, do you mean set iou_thresh=1.0
when change truth_thresh
or ignore_thresh
?
Other one-stage methods worked on dual threshold such as ignore_thresh = 0.3
and truth_thresh = 0.5
, but yolo worked on single threshold with ignore_thresh = 0.7
, this also mentioned in yolov3's paper but no explain, I just wonder why
@Kyuuki93
Happy New Year! :fireworks: :sparkler:
What do you mean? For now, all training with iou_thresh = 0.213, do you mean set iou_thresh=1.0 when change truth_thresh or ignore_thresh?
I mean may be better to use in your dataset:
ignore_thresh = 0.7
truth_thresh = 1.0
iou_thresh=1.0
While for MS COCO may be better to use
ignore_thresh = 0.7
truth_thresh = 1.0
iou_thresh=0.213
Other one-stage methods worked on dual threshold such as ignore_thresh = 0.3 and truth_thresh = 0.5, but yolo worked on single threshold with ignore_thresh = 0.7, this also mentioned in yolov3's paper but no explain, I just wonder why
What methods do you mean?
In the original Darknet there are several issues which may degrade accuracy when using low values of ignore_thresh
or truth_thresh
Initially in the original Darknet there were several wrong places which I fixed:
There was used if (best_iou > l.ignore_thresh) {
instead of if (best_match_iou > l.ignore_thresh) {
https://github.com/AlexeyAB/darknet/blame/dcfeea30f195e0ca1210d580cac8b91b6beaf3f7/src/yolo_layer.c#L355
Thus, it didn't decrease objectness
even if there was an incorrect class_id.
Now it decrease objectness
if detection_class_id != truth_class_id
- it improves accuracy if ignore_thresh < 1.0
.
When truth_thresh < 1.0
then the probability that many objects will correspond to one anchor increases. But in the original Darknet, only the last (from label-txt-file) truth-bbox affected the anchor. I fixed it - now it averages deltas of all truths which correspond to this one anchor - so truth_thresh < 1.0
and iou_thresh < 1.0
may have a better effect:
Also isn't tested and isn't fixed possible bug with MSE: https://github.com/AlexeyAB/darknet/issues/4594#issuecomment-569927386
@AlexeyAB Happy New Year!
There are old cpc
and new cpc
results, seems use loss multiplier on all loss parts could balance classes AP slightly but not improve it
Model | mAP@.5(C0/C1) | mAP@.75(C0/C1) |
---|---|---|
giou | 79.53%(69.24%/89.83%) | 59.65%(42.96%/76.34%) |
giou,cpc | 79.51% (69.07%/89.96%) | 59.52%(42.17%/76.87%) |
giou,cpc(new) | 79.44%(70.03%/88.84%) | 59.61%(44.95%/74.27%) |
I mean may be better to use in your dataset:
iou_thresh=1.0
While for MS COCO may be better to use:iou_thresh=0.213
Actually, in my dataset iou_thresh = 0.213
always get better results, I think use a lower iou_thresh
allows several anchors can predict same object, and in original darknet use only nearest anchor to predict object which limited yolo's ability, so set a lower iou_thresh
will always get better results, just need to search a suit value for a certain dataset.
What methods do you mean?
Some method use like ignore_thresh = 0.5 & truth_thresh =0.7
, which means
iou
< 0.5, negative sample
0.5 <iou
<0.7, ignore
iou
> 0.7, positive sample
I'm not sure this is exactly yolo's ignore_thresh
and truth_thresh
@Kyuuki93
seems use loss multiplier on all loss parts could balance classes AP slightly but not improve it
Yes.
Some method use like ignore_thresh = 0.5 & truth_thresh =0.7, which means iou < 0.5, negative sample 0.5 <iou<0.7, ignore iou > 0.7, positive sample
Yes.
truth_thresh
is very similar (but not the same) as iou_thresh
, so this is strange that you get better result with higher truth_thresh
and with lower iou_thresh
.
For MS COCO iou_thresh=0.213
greatly increases accuracy.
@WongKinYiu @Kyuuki93 I am adding new version of [shortcut], now I am re-making [shortcut] layer for fast BiFPN: https://github.com/AlexeyAB/darknet/issues/4382#issuecomment-569197177
so be careful by using commits from Jan 7, 2020
it may have bugs in [shortcut] layer.
Before using, try to train small model with [shortcut] layer
@AlexeyAB
Okay, thanks.
@AlexeyAB ok, thanks
@Kyuuki93 @WongKinYiu I added new version of [shortcut]
layer for BiFPN from EfficientDet: https://github.com/AlexeyAB/darknet/issues/4662
So you can try to make Detector with 1 or several BiFPN blocks. And with 1 ASFF + several BiFPN blocks (yolov3-spp-asff-bifpn-db-it.cfg)
@nyj-ocean
[convolutional]
stride=1
size=1
filters=4
activation=normalize_channels_softmax
[route]
layers=-1
group_id=0
groups=4
...
[route]
layers=-1
group_id=3
groups=4
@nyj-ocean It is due that 4-th branch has 4=(2x2) more outputs. So you should use /2 less filters in conv-layers.
@AlexeyAB
I reduce the value of filters
in some [convolutional]
layers.
But the FPS of yolov3-4l+ASFF.cfg
is still slow than yolov3-4l.cfg
I am waiting to see whether the final mAP
of yolov3-4l+ASFF.cfg
increase or not compared with yolov3-4l.cfg
But the way , i want to try ASFF + several BiFPN
,where could i download the yolov3-spp-asff-bifpn-db-it.cfg
in https://github.com/AlexeyAB/darknet/issues/4382#issuecomment-572760285?
Learning Spatial Fusion for Single-Shot Object Detection
@AlexeyAB it's seems worth to take a look