Closed Kyuuki93 closed 3 years ago
@Kyuuki93 Nice!
It seems that only ASFF-softmax works well.
Now you can try to add DropBlock https://github.com/AlexeyAB/darknet/issues/4498 + RFB-block https://github.com/AlexeyAB/darknet/issues/4507
I fixed gradient of activation=normalize_channels
in the same way as it is done for normalize_channels_softmax, also you can try new version of activation=normalize_channels
too: https://github.com/AlexeyAB/darknet/commit/7ae1ae5641b549ebaa5c816701c4b9ca73247a65
There are another Table:
Model | AP@.5 | AP@.75 |
---|---|---|
spp,mse,it=0.213,asff(softmax) | 92.45% | 61.83% |
spp,giou,it=0.213,asff(softmax) | 92.33% | 63.74% |
spp,mse,it=0.213,asff(softmax),gs | 92.08% | 62.19% |
spp,giou,it=0.213,asff(softmax),gs | 91.13% | 60.82% |
@AlexeyAB It's seems Gaussian_yolo
not suite for ASFF
I got error after add dropblock
Resizing
480 x 480
...
realloc(): invalid next size
Aborted (core dumped)
@AlexeyAB
@Kyuuki93
It's seems Gaussian_yolo not suite for ASFF
Try Gaussian_yolo with MSE.
@Kyuuki93
I got error after add dropblock
Can you share your cfg-file?
Sorry, wrong click,
There is yolov3-spp-asff-db-it.cfg.txt
Try Gaussian_yolo with MSE.
Running it now
@Kyuuki93 I fixed it: https://github.com/AlexeyAB/darknet/commit/f831835125d3181b03aa28134869099c82ca846e#diff-aad5e97a835cccda531d59ffcdcee9f1R542
I got error after add dropblock
Resizing 480 x 480
Try Gaussian_yolo with MSE.
updated on previous table
@Kyuuki93 Nice! Does DropBlock work well now?
@Kyuuki93 Nice! Does DropBlock work well now?
For now,
Model | AP@.5 | AP@.75 |
---|---|---|
spp,giou,it=0.213,asff(softmax) | 92.33% | 63.74% |
spp,giou,it=0.213,asff(softmax),dropblock | 91.66% | 62.42% |
But, this chart seems this net needs a longer training,
so I changed
max_batchs = 25200 -> 45200
steps = 10000,15000,20000 -> 20000,30000,40000
to take another run.
Also, I add rfb
after dropblock
, and results will come after this days
I got another question here, I have 2 custom dataset, 1st, 30k images for training, 4.5k images for validation, one-class, is the one used to produced previous results; 2nd, 100kimages for training, 20k images for validation, also one-class, which contains many small objects.
For example, use a same network, e.g. yolov3-spp.cfg, I can achieve
88.48% AP@.5 in 1st dataset and
91.50% AP@.5 in 2nd dataset
After this I merged those dataset to a two-class dataset (1st to 1st class, 2nd to 2nd class), also use same network, of course changed filters before yolo layer, then I got this result,
AP@.5
71.30% for class 0
90.11% for class 1
So, from the results, this merge decrease AP for all classes, but I don't know what lead this, what's your suggestion about this?
@Kyuuki93
But, this chart seems this net needs a longer training, so I changed
max_batchs = 25200 -> 45200 steps = 10000,15000,20000 -> 20000,30000,40000
If you want a smooth decrease in accuracy, then just use SGDR (cosine lr schedule) from Bag of Freebies: https://github.com/AlexeyAB/darknet/issues/3272#issuecomment-497149618
Just use
learning_rate=0.0005
burn_in=4000
max_batches = 25200
policy=sgdr
instead of
learning_rate=0.0005
burn_in=4000
max_batches = 25200
policy=steps
steps=10000,15000,20000
scales=.1,.1,.1
AP@.5 71.30% for class 0 90.11% for class 1 So, from the results, this merge decrease AP for all classes, but I don't know what lead this, what's your suggestion about this?
So dataset-0 has class-0, and dataset-1 has class-1. Are there unlabeled objects of class-1 in dataset-0? Are there unlabeled objects of class-0 in dataset-1? All objects must be mandatory labeled.
Also in general, more classes - worse accuracy.
So dataset-0 has class-0, and dataset-1 has class-1.
Yes
Are there unlabeled objects of class-1 in dataset-0? Are there unlabeled objects of class-0 in dataset-1?
No, this can ensure
Also in general, more classes - worse accuracy.
From results in many field, yes, but what lead that?
From results in many field, yes, but what lead that?
Any model has a limited capacity, and the more classes, the fewer features specific for each class can fit in the model. Classes compete for model capacity.
Any model has a limited capacity, and the more classes, the fewer features specific for each class can fit in the model. Classes compete for model capacity.
class-0 has 20k instances and class-1 has more than 80k instances, so class-1 get average accuracy because there more data to grab model capacity, opposite class-0 get worse
@Kyuuki93 May be yes.
I added fix: https://github.com/AlexeyAB/darknet/commit/e33ecb785ee288fca1fe50326f5c7b039a9f5a11
So now you can try to set in each [yolo] / [Gaussian_yolo]
layer parameter:
counters_per_class=20000, 80000
And train.
It will use multipliers for delta_class
during training
4
x for class-01
x for class-1@Kyuuki93
I found a bug and fixed it in counters_per_class=
https://github.com/AlexeyAB/darknet/commit/e43a1c424d9a20b8425d8dd8f240867f2522df3f
So now you can try to set in each
[yolo] / [Gaussian_yolo]
layer parameter:counters_per_class=20000, 80000
20000
or 80000
means class instances number?
It will use multipliers for
delta_class
during training* `4` x for class-0 * `1` x for class-1
4
and 1
it's ratio calculated by 20000
and 80000
?
If yes, set counters_per_class=1, 4
was the same with counters_per_class=20000, 80000
?
Btw, dropblock
, precisely fix size = 7
, did not get a higher AP, even take a longer training, maybe gradually increase size
as 1, 5, 7
will get a better result, I will try this later
@AlexeyAB does counter per class work out of gaussian? Is it a way to solve class imbalance problem? Would you explain that commit and it's use case?
@Kyuuki93
20000 or 80000 means class instances number? 4 and 1 it's ratio calculated by 20000 and 80000? If yes, set counters_per_class=1, 4 was the same with counters_per_class=20000, 80000?
Yes.
Btw, dropblock, precisely fix size = 7, did not get a higher AP, even take a longer training,
DropBlock with size=7 should be used only with RFB-block as I described above.
maybe gradually increase size as 1, 5, 7 will get a better result, I will try this later
Current implementation of DropBlock will gradually increase size from 1 to maxsize=7. In the implementation from paper are used maxsize=7 for all 3 DropBlocks. But you can try to use for different DropBlocks the different maxsizes 1,5,7.
@isgursoy
does counter per class work out of gaussian? Is it a way to solve class imbalance problem? Would you explain that commit and it's use case?
Yes. It is experimental feature. Just set number of objects in training dataset for each class.
@AlexeyAB
So dropblock
with size=7 should not use without rfb
,
Current implementation of DropBlock will gradually increase size from 1 to maxsize=7.
When will the size
increase? Is based on current_iters / max_batch
?
And, there are another table, some results were copied from previous table for a convenient compare
Model | AP@.5 | AP@.75 |
---|---|---|
spp,mse,it=0.213,asff(softmax) | 92.45% | 61.83% |
spp,giou,it=0.213,asff(softmax) | 92.33% | 63.74% |
spp,giou,it=0.213,asff(softmax),dropblock(size=7) | 91.66% | 62.42% |
spp,giou,it=0.213,asff(softmax),rfb(bn=0) | 91.57% | 64.95% |
spp,giou,it=0.213,asff(softmax),rfb(bn=1) | 92.32% | 60.05% |
spp,giou,it=0.213,asff(softmax),dropblock(size=7) ,rfb(bn=0) | 91.65% | 61.35% |
spp,giou,it=0.213,asff(softmax),dropblock(size=7) ,rfb(bn=1) | 92.12% | 63.88% |
spp,giou,it=0.213,asff(logistic),dropblock(size=7) ,rfb(bn=1) | 91.78% | 60.10% |
spp,giou,it=0.213,asff(relulike),dropblock(size=7) ,rfb(bn=1) | 91.43% | 61.11% |
spp,giou,it=0.213,asff(softmax),dropblock(size=7) ,rfb(bn=1),sgdr | 91.54% | 62.67% |
spp,giou,it=0.213,asff(softmax),dropblock(size=3,5,7) ,rfb(bn=1) | 90.93% | 61.20% |
(?) this is rfb
cfg file, I think its right, you can check for sure yolov3-spp-asff-it-rfb.cfg.txt
Btw, yolact
updated this days to yolact++
, this discuss may move to it's reference issue
AugFPN: Improving Multi-scale Feature Learning for Object Detection
paper https://arxiv.org/pdf/1912.05384.pdf
very similar to ASFF
@Kyuuki93
(?) this is rfb cfg file, I think its right, you can check for sure yolov3-spp-asff-it-rfb.cfg.txt
It seems this is correct.
Also you can try to train RFB-block with batch_normalize=1
, it may have higher accuracy: https://github.com/ruinmessi/ASFF/issues/46
When will the size increase? Is based on current_iters / max_batch?
Yes multiplier = current_iters / (max_batches / 2)
as in paper: https://github.com/AlexeyAB/darknet/blob/3004ee851c49e28a32fd60f2ae4a1ddf95b8b391/src/dropout_layer_kernels.cu#L31
https://github.com/AlexeyAB/darknet/blob/3004ee851c49e28a32fd60f2ae4a1ddf95b8b391/src/dropout_layer_kernels.cu#L39-L40
Btw, yolact updated this days to yolact++, this discuss may move to it's reference issue
Thanks! I will look at it: https://github.com/AlexeyAB/darknet/issues/3048#issuecomment-567017091
Also you can try to train RFB-block with
batch_normalize=1
, it may have higher accuracy: ruinmessi/ASFF#46
I have seen your discussion with ASFF's author before, compare of rfb(bn=0)
with rfb(bn=1)
was on schedule.
SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization
Another work on FPN connection manipulates, it seems recently researchers paid more attention to the connection method of FPN
It is not clear what is better: ASFF, BiFPN, or these AugFPN, SpineNet...
AugFPN: Improving Multi-scale Feature Learning for Object Detection
They compare their network with Yolov2 in 2019, and dont compare Speed/Bflops. But may be we should read it:
By replacing FPN with AugFPN in Faster R-CNN, our models achieve 2.3 and 1.6 points higher Average Precision (AP) when using ResNet50 and MobileNet-v2 as backbone respectively. Furthermore, AugFPN improves RetinaNet by 1.6 points AP and FCOS by 0.9 points AP when using ResNet50 as backbone.
SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization
paper https://arxiv.org/pdf/1912.05027.pdf Another work on FPN connection manipulates, it seems recently researchers paid more attention to the connection method of FPN
They compare AP / Bflops, but don't compare AP / FPS. This is usually done when the network is slow. But maybe we should read the article.
SpineNet achieves state-of-theart performance of one-stage object detector on COCO with 60% less computation, and outperforms ResNet-FPN counterparts by 6% AP. SpineNet architecture can transfer to classification tasks, achieving 6% top-1 accuracy improvement on a challenging iNaturalist fine-grained dataset.
It is not clear what is better: ASFF, BiFPN, or these AugFPN, SpineNet
Yes, we need to read more details for implementation, then it’s able to compare in this framework. In my understanding, their high AP with slow speed is result from heavy backbone, all their idea were about how to connect FPN, which we only need to change network to implement, that may could useful to yolo.
Will take a deep look then
@AlexeyAB updated Table here https://github.com/AlexeyAB/darknet/issues/4382#issuecomment-567010280
It's seems rfb bn=1
better than rfb bn=0
, dropblock (7,7,7)
better than dropblock (3,5,7)
Also maybe this one-class dataset was too easy, and hard to show the validity of these improvement, perhaps I should run all experiments again in the two-class dataset which mentioned before
@Kyuuki93 Well.
Current implementation of dropblock (7,7,7) in the Darknet will use dropblock from (1,1,1) initially to (7,7,7) at the max_batches/2
iterations - so this is much closer to the article.
This is very strange why dropblock drops AP@75. May be with dropblock the model should be trained more iterations.
Show cfg-file and char.png file of spp,giou,it=0.213,asff(softmax),dropblock(size=7) ,rfb(bn=1)
Do you use sgdr-lr-policy or step-lr-policy, and how many iterations did you train?
This is very strange why dropblock drops AP@75. May be with dropblock the model should be trained more iterations.
Double training iterations did not get higher AP.
Show cfg-file and char.png file of
spp,giou,it=0.213,asff(softmax),dropblock(size=7) ,rfb(bn=1)
Sorry, I have checked it but not keep it, it's just look like https://github.com/AlexeyAB/darknet/issues/4382#issuecomment-566509160
Do you use sgdr-lr-policy or step-lr-policy, and how many iterations did you train?
Is step-lr-policy,
@AlexeyAB I fulfilled previous table,
spp,giou,it=0.213,asff(softmax) | 92.33% | 63.74%| spp,giou,it=0.213,asff(softmax),dropblock(size=7) ,rfb(bn=1) |92.12% | 63.88%|
maybe improvement of rfb
block was based on guiding anchor
@Kyuuki93 Thanks!
Did you try to test spp,giou,it=0.213,asff(softmax),rfb(bn=1)
(RFB with batch-norm, but without dropblock)?
In sgdr - did you use policy=sgdr
or policy=sgdr sgdr_cycle=1000
?
- Did you try to test
spp,giou,it=0.213,asff(softmax),rfb(bn=1)
(RFB with batch-norm, but without dropblock)?
Yes, is still on the training, this is the last experiment on class-0 dataset, following experiment will use two-class dataset
- In sgdr - did you use
policy=sgdr
orpolicy=sgdr sgdr_cycle=1000
?
Only policy=sgdr
, how to choose the nunber of sgdr_cycle = ?
, this is chart of training use sgdr
, the part in blue cycle was normal for sgdr
or just need longer training?
Only policy=sgdr,
Thats right!
how to choose the nunber of sgdr_cycle = ?,
sgdr_cycle = num_images_in_train_txt / batch
- but you should choose max batches so that it fell on the end of a cycle, so don't use it.
this is chart of training use sgdr, the part in blue cycle was normal for sgdr or just need longer training?
It seems it should be trained longer.
@AlexeyAB finished spp,giou,it=0.213,asff(softmax),rfb(bn=1)
training, nearly straight after 8k iters,
@Kyuuki93
Yes, firstly AP50 stops, and later AP75 stops.
This is weird that rfb(bn=0) is better than rfb(bn=1) for AP75.
In general I think spp,giou,it=0.213,asff(softmax),dropblock(size=7) ,rfb(bn=1)
the best model. Try it on multiclass-dataset.
On two-class dataset, all Model
based on yolov3-spp.cfg
with iou_thresh =0.213
, giou
with iou_normalizer=0.5
, all train set same batch_size
and step-lr-policy
, C0/C1
means class-0 and class-1 which includes 20k and 80k instance respectively. This table may spend two weeks, results will gradually update.
Model | mAP@.5(C0/C1) | mAP@.75(C0/C1) |
---|---|---|
mse | ||
giou | 79.53%(69.24%/89.83%) | 59.65%(42.96%/76.34%) |
giou,cpc | 79.51% (69.07%/89.96%) | 59.52%(42.17%/76.87%) |
giou,asff,rfb | ||
giou,asff,dropblock,rfb |
cpc
means counters_per_class
asff
default with softmax
rfb
default with bn=1
dropblock
default with size=7
asff default with softmax
Also try assf without softmax, at least for 1 class, just to know, is it implemented correctly, and should we use it for BiFPN implementation.
asff default with softmax
Also try assf without softmax, at least for 1 class, just to know, is it implemented correctly, and should we use it for BiFPN implementation.
Ok, I will do it for 1 class which would be faster
@AlexeyAB Previous table was completed, but I think ASFF
norm channels once but BiFPN
can norm channels many times, maybe that's why BiFPN
could use a simple norm function
@Kyuuki93 Yes, may be many fusions compensate for the disadvantages of norm_channels with ReLU, since there are many BiFPN blocks with many norm_chanells in each BiFPN-block in the EfficientDet.
So the best models from: https://github.com/AlexeyAB/darknet/issues/4382#issuecomment-567010280
Model | AP@.5 | AP@.75 |
---|---|---|
spp,mse,it=0.213,asff(softmax) | 92.45% | 61.83% |
spp,giou,it=0.213,asff(softmax) | 92.33% | 63.74% |
spp,giou,it=0.213,asff(softmax),rfb(bn=0) | 91.57% | 64.95% |
spp,giou,it=0.213,asff(softmax),dropblock(size=7) ,rfb(bn=1) | 92.12% | 63.88% |
@Kyuuki93
I would suggest to use rfb(bn=1)
instead of ,rfb(bn=0)
in your new experiments with 2 classes: https://github.com/AlexeyAB/darknet/issues/4382#issuecomment-568184405
@Kyuuki93
Also since we don't use Deformable-conv, then you can try to use RFB-block with flexible receptive field from 1x1 to 11x11: https://github.com/AlexeyAB/darknet/issues/4507#issuecomment-568296011
You can test with one class too.
@AlexeyAB Seems counters_per_class
not work out data unbalance issue, compare with class weights
in classification problem, classify loss were produced by class
only, but in detection problem, loss were produced by class and location even objectness, and loss from loc and obj were not relevant to class label, so losses multiplier could not work this out, actually in my dataset, class of box got high accuracy, its seems model just can't find the class-0's object which lack data on training dataset
Also since we don't use Deformable-conv, then you can try to use RFB-block with flexible receptive field from 1x1 to 11x11: #4507 (comment)
By change activation = linear
to activation = leaky
?
Learning Spatial Fusion for Single-Shot Object Detection
@AlexeyAB it's seems worth to take a look