ASFF - Learning Spatial Fusion for Single-Shot Object Detection - 63% mAP@0.5 with 45.5FPS

Kyuuki93 commented 4 years ago

Learning Spatial Fusion for Single-Shot Object Detection

paper https://arxiv.org/pdf/1911.09516.pdf
code https://github.com/ruinmessi/ASFF

@AlexeyAB it's seems worth to take a look

AlexeyAB commented 4 years ago

@Kyuuki93 Nice!

It seems that only ASFF-softmax works well.

Now you can try to add DropBlock https://github.com/AlexeyAB/darknet/issues/4498 + RFB-block https://github.com/AlexeyAB/darknet/issues/4507

I fixed gradient of activation=normalize_channels in the same way as it is done for normalize_channels_softmax, also you can try new version of activation=normalize_channels too: https://github.com/AlexeyAB/darknet/commit/7ae1ae5641b549ebaa5c816701c4b9ca73247a65

Kyuuki93 commented 4 years ago

There are another Table:

Model	AP@.5	AP@.75
spp,mse,it=0.213,asff(softmax)	92.45%	61.83%
spp,giou,it=0.213,asff(softmax)	92.33%	63.74%
spp,mse,it=0.213,asff(softmax),gs	92.08%	62.19%
spp,giou,it=0.213,asff(softmax),gs	91.13%	60.82%

@AlexeyAB It's seems Gaussian_yolo not suite for ASFF

Kyuuki93 commented 4 years ago

I got error after add dropblock

Resizing
480 x 480
...
realloc(): invalid next size
Aborted (core dumped)

@AlexeyAB

AlexeyAB commented 4 years ago

@Kyuuki93

It's seems Gaussian_yolo not suite for ASFF

Try Gaussian_yolo with MSE.

AlexeyAB commented 4 years ago

@Kyuuki93

I got error after add dropblock

Can you share your cfg-file?

Kyuuki93 commented 4 years ago

Sorry, wrong click,

There is yolov3-spp-asff-db-it.cfg.txt

Kyuuki93 commented 4 years ago

Try Gaussian_yolo with MSE.

Running it now

AlexeyAB commented 4 years ago

@Kyuuki93 I fixed it: https://github.com/AlexeyAB/darknet/commit/f831835125d3181b03aa28134869099c82ca846e#diff-aad5e97a835cccda531d59ffcdcee9f1R542

I got error after add dropblock

Resizing 480 x 480

Kyuuki93 commented 4 years ago

Try Gaussian_yolo with MSE.

updated on previous table

AlexeyAB commented 4 years ago

@Kyuuki93 Nice! Does DropBlock work well now?

Kyuuki93 commented 4 years ago

@Kyuuki93 Nice! Does DropBlock work well now?

For now,

Model	AP@.5	AP@.75
spp,giou,it=0.213,asff(softmax)	92.33%	63.74%
spp,giou,it=0.213,asff(softmax),dropblock	91.66%	62.42%

But, this chart seems this net needs a longer training, chart

so I changed

max_batchs = 25200 -> 45200
steps = 10000,15000,20000 -> 20000,30000,40000

to take another run.

Also, I add rfb after dropblock, and results will come after this days

Kyuuki93 commented 4 years ago

I got another question here, I have 2 custom dataset, 1st, 30k images for training, 4.5k images for validation, one-class, is the one used to produced previous results; 2nd, 100kimages for training, 20k images for validation, also one-class, which contains many small objects.

For example, use a same network, e.g. yolov3-spp.cfg, I can achieve

88.48% AP@.5 in 1st dataset and 
91.50% AP@.5 in 2nd dataset

After this I merged those dataset to a two-class dataset (1st to 1st class, 2nd to 2nd class), also use same network, of course changed filters before yolo layer, then I got this result,

AP@.5
71.30% for class 0
90.11% for class 1

So, from the results, this merge decrease AP for all classes, but I don't know what lead this, what's your suggestion about this?

AlexeyAB commented 4 years ago

@Kyuuki93

But, this chart seems this net needs a longer training, so I changed

max_batchs = 25200 -> 45200 steps = 10000,15000,20000 -> 20000,30000,40000

If you want a smooth decrease in accuracy, then just use SGDR (cosine lr schedule) from Bag of Freebies: https://github.com/AlexeyAB/darknet/issues/3272#issuecomment-497149618

Just use

learning_rate=0.0005
burn_in=4000
max_batches = 25200
policy=sgdr

instead of

learning_rate=0.0005
burn_in=4000
max_batches = 25200
policy=steps
steps=10000,15000,20000
scales=.1,.1,.1

AlexeyAB commented 4 years ago

AP@.5 71.30% for class 0 90.11% for class 1 So, from the results, this merge decrease AP for all classes, but I don't know what lead this, what's your suggestion about this?

So dataset-0 has class-0, and dataset-1 has class-1. Are there unlabeled objects of class-1 in dataset-0? Are there unlabeled objects of class-0 in dataset-1? All objects must be mandatory labeled.

Also in general, more classes - worse accuracy.

Kyuuki93 commented 4 years ago

So dataset-0 has class-0, and dataset-1 has class-1.

Yes

Are there unlabeled objects of class-1 in dataset-0? Are there unlabeled objects of class-0 in dataset-1?

No, this can ensure

Also in general, more classes - worse accuracy.

From results in many field, yes, but what lead that?

AlexeyAB commented 4 years ago

From results in many field, yes, but what lead that?

Any model has a limited capacity, and the more classes, the fewer features specific for each class can fit in the model. Classes compete for model capacity.

Kyuuki93 commented 4 years ago

Any model has a limited capacity, and the more classes, the fewer features specific for each class can fit in the model. Classes compete for model capacity.

class-0 has 20k instances and class-1 has more than 80k instances, so class-1 get average accuracy because there more data to grab model capacity, opposite class-0 get worse

AlexeyAB commented 4 years ago

@Kyuuki93 May be yes.

I added fix: https://github.com/AlexeyAB/darknet/commit/e33ecb785ee288fca1fe50326f5c7b039a9f5a11

So now you can try to set in each [yolo] / [Gaussian_yolo] layer parameter: counters_per_class=20000, 80000 And train. It will use multipliers for delta_class during training

4 x for class-0
1 x for class-1

AlexeyAB commented 4 years ago

@Kyuuki93

I found a bug and fixed it in counters_per_class= https://github.com/AlexeyAB/darknet/commit/e43a1c424d9a20b8425d8dd8f240867f2522df3f

Kyuuki93 commented 4 years ago

So now you can try to set in each [yolo] / [Gaussian_yolo] layer parameter: counters_per_class=20000, 80000

20000 or 80000 means class instances number?

It will use multipliers for delta_class during training
* `4` x for class-0

* `1` x for class-1

4 and 1 it's ratio calculated by 20000 and 80000? If yes, set counters_per_class=1, 4 was the same with counters_per_class=20000, 80000?

Kyuuki93 commented 4 years ago

Btw, dropblock, precisely fix size = 7, did not get a higher AP, even take a longer training, maybe gradually increase size as 1, 5, 7 will get a better result, I will try this later

isgursoy commented 4 years ago

@AlexeyAB does counter per class work out of gaussian? Is it a way to solve class imbalance problem? Would you explain that commit and it's use case?

AlexeyAB commented 4 years ago

@Kyuuki93

20000 or 80000 means class instances number? 4 and 1 it's ratio calculated by 20000 and 80000? If yes, set counters_per_class=1, 4 was the same with counters_per_class=20000, 80000?

Yes.

Btw, dropblock, precisely fix size = 7, did not get a higher AP, even take a longer training,

DropBlock with size=7 should be used only with RFB-block as I described above.

With the RBF, the DropBlock will obscure most of the RBF receptive field, and therefore will force the rest of the RBF to learn - this will improve accuracy.
Without the RBF, the DropBlock completely obscures several final activations in which there are objects - this will interfere with learning.

maybe gradually increase size as 1, 5, 7 will get a better result, I will try this later

Current implementation of DropBlock will gradually increase size from 1 to maxsize=7. In the implementation from paper are used maxsize=7 for all 3 DropBlocks. But you can try to use for different DropBlocks the different maxsizes 1,5,7.

AlexeyAB commented 4 years ago

@isgursoy

does counter per class work out of gaussian? Is it a way to solve class imbalance problem? Would you explain that commit and it's use case?

Yes. It is experimental feature. Just set number of objects in training dataset for each class.

Kyuuki93 commented 4 years ago

@AlexeyAB So dropblock with size=7 should not use without rfb,

Current implementation of DropBlock will gradually increase size from 1 to maxsize=7.

When will the size increase? Is based on current_iters / max_batch?

And, there are another table, some results were copied from previous table for a convenient compare

Model	AP@.5	AP@.75
spp,mse,it=0.213,asff(softmax)	92.45%	61.83%
spp,giou,it=0.213,asff(softmax)	92.33%	63.74%
spp,giou,it=0.213,asff(softmax),dropblock(size=7)	91.66%	62.42%
spp,giou,it=0.213,asff(softmax),rfb(bn=0)	91.57%	64.95%
spp,giou,it=0.213,asff(softmax),rfb(bn=1)	92.32%	60.05%
spp,giou,it=0.213,asff(softmax),dropblock(size=7) ,rfb(bn=0)	91.65%	61.35%
spp,giou,it=0.213,asff(softmax),dropblock(size=7) ,rfb(bn=1)	92.12%	63.88%
spp,giou,it=0.213,asff(logistic),dropblock(size=7) ,rfb(bn=1)	91.78%	60.10%
spp,giou,it=0.213,asff(relulike),dropblock(size=7) ,rfb(bn=1)	91.43%	61.11%
spp,giou,it=0.213,asff(softmax),dropblock(size=7) ,rfb(bn=1),sgdr	91.54%	62.67%
spp,giou,it=0.213,asff(softmax),dropblock(size=3,5,7) ,rfb(bn=1)	90.93%	61.20%

(?) this is rfb cfg file, I think its right, you can check for sure yolov3-spp-asff-it-rfb.cfg.txt

Btw, yolact updated this days to yolact++, this discuss may move to it's reference issue

Kyuuki93 commented 4 years ago

AugFPN: Improving Multi-scale Feature Learning for Object Detection

paper https://arxiv.org/pdf/1912.05384.pdf

very similar to ASFF

AlexeyAB commented 4 years ago

@Kyuuki93

(?) this is rfb cfg file, I think its right, you can check for sure yolov3-spp-asff-it-rfb.cfg.txt

It seems this is correct. Also you can try to train RFB-block with batch_normalize=1, it may have higher accuracy: https://github.com/ruinmessi/ASFF/issues/46

When will the size increase? Is based on current_iters / max_batch?

Yes multiplier = current_iters / (max_batches / 2) as in paper: https://github.com/AlexeyAB/darknet/blob/3004ee851c49e28a32fd60f2ae4a1ddf95b8b391/src/dropout_layer_kernels.cu#L31 https://github.com/AlexeyAB/darknet/blob/3004ee851c49e28a32fd60f2ae4a1ddf95b8b391/src/dropout_layer_kernels.cu#L39-L40

Btw, yolact updated this days to yolact++, this discuss may move to it's reference issue

Thanks! I will look at it: https://github.com/AlexeyAB/darknet/issues/3048#issuecomment-567017091

Kyuuki93 commented 4 years ago

Also you can try to train RFB-block with batch_normalize=1, it may have higher accuracy: ruinmessi/ASFF#46

I have seen your discussion with ASFF's author before, compare of rfb(bn=0) with rfb(bn=1) was on schedule.

Kyuuki93 commented 4 years ago

SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization

paper https://arxiv.org/pdf/1912.05027.pdf

Another work on FPN connection manipulates, it seems recently researchers paid more attention to the connection method of FPN

AlexeyAB commented 4 years ago

It is not clear what is better: ASFF, BiFPN, or these AugFPN, SpineNet...

AugFPN: Improving Multi-scale Feature Learning for Object Detection

paper https://arxiv.org/pdf/1912.05384.pdf

They compare their network with Yolov2 in 2019, and dont compare Speed/Bflops. But may be we should read it:

By replacing FPN with AugFPN in Faster R-CNN, our models achieve 2.3 and 1.6 points higher Average Precision (AP) when using ResNet50 and MobileNet-v2 as backbone respectively. Furthermore, AugFPN improves RetinaNet by 1.6 points AP and FCOS by 0.9 points AP when using ResNet50 as backbone.

SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization

paper https://arxiv.org/pdf/1912.05027.pdf Another work on FPN connection manipulates, it seems recently researchers paid more attention to the connection method of FPN

They compare AP / Bflops, but don't compare AP / FPS. This is usually done when the network is slow. But maybe we should read the article.

SpineNet achieves state-of-theart performance of one-stage object detector on COCO with 60% less computation, and outperforms ResNet-FPN counterparts by 6% AP. SpineNet architecture can transfer to classification tasks, achieving 6% top-1 accuracy improvement on a challenging iNaturalist fine-grained dataset.

Kyuuki93 commented 4 years ago

It is not clear what is better: ASFF, BiFPN, or these AugFPN, SpineNet

Yes, we need to read more details for implementation, then it’s able to compare in this framework. In my understanding, their high AP with slow speed is result from heavy backbone, all their idea were about how to connect FPN, which we only need to change network to implement, that may could useful to yolo.

Will take a deep look then

Kyuuki93 commented 4 years ago

@AlexeyAB updated Table here https://github.com/AlexeyAB/darknet/issues/4382#issuecomment-567010280 It's seems rfb bn=1 better than rfb bn=0, dropblock (7,7,7) better than dropblock (3,5,7)

Kyuuki93 commented 4 years ago

Also maybe this one-class dataset was too easy, and hard to show the validity of these improvement, perhaps I should run all experiments again in the two-class dataset which mentioned before

AlexeyAB commented 4 years ago

@Kyuuki93 Well. Current implementation of dropblock (7,7,7) in the Darknet will use dropblock from (1,1,1) initially to (7,7,7) at the max_batches/2 iterations - so this is much closer to the article.

This is very strange why dropblock drops AP@75. May be with dropblock the model should be trained more iterations.

Show cfg-file and char.png file of spp,giou,it=0.213,asff(softmax),dropblock(size=7) ,rfb(bn=1) Do you use sgdr-lr-policy or step-lr-policy, and how many iterations did you train?

Kyuuki93 commented 4 years ago

This is very strange why dropblock drops AP@75. May be with dropblock the model should be trained more iterations.

Double training iterations did not get higher AP.

Show cfg-file and char.png file of spp,giou,it=0.213,asff(softmax),dropblock(size=7) ,rfb(bn=1)

Sorry, I have checked it but not keep it, it's just look like https://github.com/AlexeyAB/darknet/issues/4382#issuecomment-566509160

Do you use sgdr-lr-policy or step-lr-policy, and how many iterations did you train?

Is step-lr-policy,

Kyuuki93 commented 4 years ago

@AlexeyAB I fulfilled previous table,

spp,giou,it=0.213,asff(softmax) | 92.33% | 63.74%| spp,giou,it=0.213,asff(softmax),dropblock(size=7) ,rfb(bn=1) |92.12% | 63.88%|

maybe improvement of rfb block was based on guiding anchor

AlexeyAB commented 4 years ago

@Kyuuki93 Thanks!

Did you try to test spp,giou,it=0.213,asff(softmax),rfb(bn=1) (RFB with batch-norm, but without dropblock)?
In sgdr - did you use policy=sgdr or policy=sgdr sgdr_cycle=1000 ?

Kyuuki93 commented 4 years ago

Did you try to test spp,giou,it=0.213,asff(softmax),rfb(bn=1) (RFB with batch-norm, but without dropblock)?

Yes, is still on the training, this is the last experiment on class-0 dataset, following experiment will use two-class dataset

In sgdr - did you use policy=sgdr or policy=sgdr sgdr_cycle=1000 ?

Only policy=sgdr, how to choose the nunber of sgdr_cycle = ?, this is chart of training use sgdr, the part in blue cycle was normal for sgdr or just need longer training? chart

AlexeyAB commented 4 years ago

Only policy=sgdr,

Thats right!

how to choose the nunber of sgdr_cycle = ?,

sgdr_cycle = num_images_in_train_txt / batch - but you should choose max batches so that it fell on the end of a cycle, so don't use it.

this is chart of training use sgdr, the part in blue cycle was normal for sgdr or just need longer training?

It seems it should be trained longer.

Kyuuki93 commented 4 years ago

@AlexeyAB finished spp,giou,it=0.213,asff(softmax),rfb(bn=1) training, nearly straight after 8k iters, chart

AlexeyAB commented 4 years ago

@Kyuuki93

Yes, firstly AP50 stops, and later AP75 stops.

This is weird that rfb(bn=0) is better than rfb(bn=1) for AP75.

In general I think spp,giou,it=0.213,asff(softmax),dropblock(size=7) ,rfb(bn=1) the best model. Try it on multiclass-dataset.

Kyuuki93 commented 4 years ago

On two-class dataset, all Model based on yolov3-spp.cfg with iou_thresh =0.213, giou with iou_normalizer=0.5, all train set same batch_size and step-lr-policy, C0/C1 means class-0 and class-1 which includes 20k and 80k instance respectively. This table may spend two weeks, results will gradually update.

Model	mAP@.5(C0/C1)	mAP@.75(C0/C1)
mse
giou	79.53%(69.24%/89.83%)	59.65%(42.96%/76.34%)
giou,cpc	79.51% (69.07%/89.96%)	59.52%(42.17%/76.87%)
giou,asff,rfb
giou,asff,dropblock,rfb

cpc means counters_per_class asff default with softmax rfb default with bn=1 dropblock default with size=7

AlexeyAB commented 4 years ago

asff default with softmax

Also try assf without softmax, at least for 1 class, just to know, is it implemented correctly, and should we use it for BiFPN implementation.

Kyuuki93 commented 4 years ago

asff default with softmax

Also try assf without softmax, at least for 1 class, just to know, is it implemented correctly, and should we use it for BiFPN implementation.

Ok, I will do it for 1 class which would be faster

Kyuuki93 commented 4 years ago

@AlexeyAB Previous table was completed, but I think ASFF norm channels once but BiFPN can norm channels many times, maybe that's why BiFPN could use a simple norm function

AlexeyAB commented 4 years ago

@Kyuuki93 Yes, may be many fusions compensate for the disadvantages of norm_channels with ReLU, since there are many BiFPN blocks with many norm_chanells in each BiFPN-block in the EfficientDet.

So the best models from: https://github.com/AlexeyAB/darknet/issues/4382#issuecomment-567010280

Model	AP@.5	AP@.75
spp,mse,it=0.213,asff(softmax)	92.45%	61.83%
spp,giou,it=0.213,asff(softmax)	92.33%	63.74%
spp,giou,it=0.213,asff(softmax),rfb(bn=0)	91.57%	64.95%
spp,giou,it=0.213,asff(softmax),dropblock(size=7) ,rfb(bn=1)	92.12%	63.88%

AlexeyAB commented 4 years ago

@Kyuuki93 I would suggest to use rfb(bn=1) instead of ,rfb(bn=0) in your new experiments with 2 classes: https://github.com/AlexeyAB/darknet/issues/4382#issuecomment-568184405

AlexeyAB commented 4 years ago

@Kyuuki93

Also since we don't use Deformable-conv, then you can try to use RFB-block with flexible receptive field from 1x1 to 11x11: https://github.com/AlexeyAB/darknet/issues/4507#issuecomment-568296011

You can test with one class too.

Kyuuki93 commented 4 years ago

@AlexeyAB Seems counters_per_class not work out data unbalance issue, compare with class weights in classification problem, classify loss were produced by class only, but in detection problem, loss were produced by class and location even objectness, and loss from loc and obj were not relevant to class label, so losses multiplier could not work this out, actually in my dataset, class of box got high accuracy, its seems model just can't find the class-0's object which lack data on training dataset

Kyuuki93 commented 4 years ago

Also since we don't use Deformable-conv, then you can try to use RFB-block with flexible receptive field from 1x1 to 11x11: #4507 (comment)

By change activation = linear to activation = leaky?

AlexeyAB / darknet

ASFF - Learning Spatial Fusion for Single-Shot Object Detection - 63% mAP@0.5 with 45.5FPS #4382