Open Donghyun-Son opened 5 years ago
@LukeAI Oh..I forgot to talk about ELASTIC. Thank you for comment it. @AlexeyAB Can you help me?
ELASTIC can be applied on any network architecture, increases Accuracy, decreases FLOPS.
Scale variation has been a challenge from traditional to modern approaches in computer vision. Most solutions to scale issues have a similar theme: a set of intuitive and manually designed policies that are generic and fixed (e.g. SIFT or feature pyramid). We argue that the scaling policy should be learned from data. In this paper, we introduce ELASTIC, a simple, efficient and yet very effective approach to learn a dynamic scale policy from data. We formulate the scaling policy as a non-linear function inside the network's structure that (a) is learned from data, (b) is instance specific, (c) does not add extra computation, and (d) can be applied on any network architecture. We applied ELASTIC to several state-of-the-art network architectures and showed consistent improvement without extra (sometimes even lower) computation on ImageNet classification, MSCOCO multi-label classification, and PASCAL VOC semantic segmentation. Our results show major improvement for images with scale challenges. Our code is available here: this https URL
@tbc01204 Hi,
Do we need only to implement ELASTIC
[avgpooling] stride=2 size=2
[batchnorm]
or something else?
@AlexeyAB Hi, Thank you for your help. I guess that's enough.
The following code is a my elastic-yolov3 cfg file that can be used with current darknet. I just change the maxpool to avgpooling and add batch normalization.
[net]
[net]
# Testing
# batch=1
# subdivisions=1
# Training
batch=64
subdivisions=16
# yolo net
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1
## single gpu
# learning_rate=0.001
# burn_in=1000
# max_batches = 500200
## 2 gpu
# learning_rate=0.0005
# burn_in=2000
# max_batches = 500200
## 4 gpu
learning_rate=0.00025
burn_in=4000
max_batches = 500200
policy=steps
steps=400000,450000
scales=.1,.1
# darknet53 net
# height=256
# width=256
# channels=3
# min_crop=128
# max_crop=448
# burn_in=1000
# learning_rate=0.1
# policy=poly
# steps=800000
# power=4
# max_batches=800000
# momentum=0.9
# decay=0.0005
# scales=.1,.1
# angle=7
# hue=.1
# saturation=.75
# exposure=.75
# aspect=.75
[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky
# Downsample
[convolutional]
batch_normalize=1
filters=64
size=3
stride=2
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=16
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=64
size=3
stride=1
pad=1
activation=linear
# ELASTIC
[route]
layers=-3
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=16
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=64
size=3
stride=1
pad=1
activation=linear
[upsample]
stride=2
[shortcut]
from=-6
activation=linear
# Downsample
[convolutional]
batch_normalize=1
filters=128
size=3
stride=2
pad=1
activation=leaky
# x1
[convolutional]
batch_normalize=1
filters=32
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=128
size=3
stride=1
pad=1
activation=linear
# ELASTIC
[route]
layers=-3
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=32
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=128
size=3
stride=1
pad=1
activation=linear
[upsample]
stride=2
[shortcut]
from=-6
activation=linear
# x2
[convolutional]
batch_normalize=1
filters=32
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=128
size=3
stride=1
pad=1
activation=linear
# ELASTIC
[route]
layers=-3
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=32
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=128
size=3
stride=1
pad=1
activation=linear
[upsample]
stride=2
[shortcut]
from=-6
activation=linear
# Downsample
[convolutional]
batch_normalize=1
filters=256
size=3
stride=2
pad=1
activation=leaky
# x1
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=256
size=3
stride=1
pad=1
activation=linear
# ELASTIC
[route]
layers=-3
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=256
size=3
stride=1
pad=1
activation=linear
[upsample]
stride=2
[shortcut]
from=-6
activation=linear
# x2
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=256
size=3
stride=1
pad=1
activation=linear
# ELASTIC
[route]
layers=-3
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=256
size=3
stride=1
pad=1
activation=linear
[upsample]
stride=2
[shortcut]
from=-6
activation=linear
# x3
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=256
size=3
stride=1
pad=1
activation=linear
# ELASTIC
[route]
layers=-3
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=256
size=3
stride=1
pad=1
activation=linear
[upsample]
stride=2
[shortcut]
from=-6
activation=linear
# x4
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=256
size=3
stride=1
pad=1
activation=linear
# ELASTIC
[route]
layers=-3
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=256
size=3
stride=1
pad=1
activation=linear
[upsample]
stride=2
[shortcut]
from=-6
activation=linear
# x5
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=256
size=3
stride=1
pad=1
activation=linear
# ELASTIC
[route]
layers=-3
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=256
size=3
stride=1
pad=1
activation=linear
[upsample]
stride=2
[shortcut]
from=-6
activation=linear
# x6
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=256
size=3
stride=1
pad=1
activation=linear
# ELASTIC
[route]
layers=-3
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=256
size=3
stride=1
pad=1
activation=linear
[upsample]
stride=2
[shortcut]
from=-6
activation=linear
# x7
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=256
size=3
stride=1
pad=1
activation=linear
# ELASTIC
[route]
layers=-3
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=256
size=3
stride=1
pad=1
activation=linear
[upsample]
stride=2
[shortcut]
from=-6
activation=linear
# x8
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=256
size=3
stride=1
pad=1
activation=linear
# ELASTIC
[route]
layers=-3
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=256
size=3
stride=1
pad=1
activation=linear
[upsample]
stride=2
[shortcut]
from=-6
activation=linear
# Downsample
# x1
[convolutional]
batch_normalize=1
filters=512
size=3
stride=2
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=512
size=3
stride=1
pad=1
activation=linear
# ELASTIC
[route]
layers=-3
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=512
size=3
stride=1
pad=1
activation=linear
[upsample]
stride=2
[shortcut]
from=-6
activation=linear
# x2
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=512
size=3
stride=1
pad=1
activation=linear
# ELASTIC
[route]
layers=-3
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=512
size=3
stride=1
pad=1
activation=linear
[upsample]
stride=2
[shortcut]
from=-6
activation=linear
# x3
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=512
size=3
stride=1
pad=1
activation=linear
# ELASTIC
[route]
layers=-3
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=512
size=3
stride=1
pad=1
activation=linear
[upsample]
stride=2
[shortcut]
from=-6
activation=linear
# x4
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=512
size=3
stride=1
pad=1
activation=linear
# ELASTIC
[route]
layers=-3
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=512
size=3
stride=1
pad=1
activation=linear
[upsample]
stride=2
[shortcut]
from=-6
activation=linear
# x5
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=512
size=3
stride=1
pad=1
activation=linear
# ELASTIC
[route]
layers=-3
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=512
size=3
stride=1
pad=1
activation=linear
[upsample]
stride=2
[shortcut]
from=-6
activation=linear
# x6
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=512
size=3
stride=1
pad=1
activation=linear
# ELASTIC
[route]
layers=-3
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=512
size=3
stride=1
pad=1
activation=linear
[upsample]
stride=2
[shortcut]
from=-6
activation=linear
# x7
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=512
size=3
stride=1
pad=1
activation=linear
# ELASTIC
[route]
layers=-3
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=512
size=3
stride=1
pad=1
activation=linear
[upsample]
stride=2
[shortcut]
from=-6
activation=linear
# x8
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=512
size=3
stride=1
pad=1
activation=linear
# ELASTIC
[route]
layers=-3
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
filters=512
size=3
stride=1
pad=1
activation=linear
[upsample]
stride=2
[shortcut]
from=-6
activation=linear
# Downsample
[convolutional]
batch_normalize=1
filters=1024
size=3
stride=2
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
# classification
# [avgpool]
# [convolutional]
# batch_normalize=1
# filters=1000
# size=1
# stride=1
# pad=1
# activation=linear
# [softmax]
# groups=1
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky
[convolutional]
size=1
stride=1
pad=1
filters=255
activation=linear
[yolo]
mask = 6,7,8
anchors = 13, 19, 32, 47, 47,114, 99, 55, 134,129, 80,226, 308,145, 175,285, 351,339
classes=80
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
[route]
layers = -4
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[upsample]
stride=2
[route]
layers = -1, 156
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky
[convolutional]
size=1
stride=1
pad=1
filters=255
activation=linear
[yolo]
mask = 3,4,5
anchors = 13, 19, 32, 47, 47,114, 99, 55, 134,129, 80,226, 308,145, 175,285, 351,339
classes=80
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
[route]
layers = -4
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[upsample]
stride=2
[route]
layers = -1, 91
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=leaky
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=leaky
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=leaky
[convolutional]
size=1
stride=1
pad=1
filters=255
activation=linear
[yolo]
mask = 0,1,2
anchors = 13, 19, 32, 47, 47,114, 99, 55, 134,129, 80,226, 308,145, 175,285, 351,339
classes=80
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
@tbc01204
I just change the maxpool to avgpooling and add batch normalization.
Do you mean, that you didn't change maxpool to avgpooling yet, but you want to change maxpool to avgpooling and add batch normalization?
@tbc01204
I just change the maxpool to avgpooling and add batch normalization.
Do you mean, that you didn't change maxpool to avgpooling yet, but you want to change maxpool to avgpooling and add batch normalization?
Yes. I have to change them. Sorry for the confusion.
@WongKinYiu
Is it enough
[route]
layers = -1
group_id=0
groups=2
to implement Elastic models? https://github.com/WongKinYiu/CrossStagePartialNetworks/blob/master/cfg/csresnext50-elastic.cfg
Those. is there no need to implement?
@AlexeyAB
If [avgpooling] stride=2 size=2
and [batchnorm] can be implemented, it is much better.
@WongKinYiu
I added layer:
[local_avgpool]
size=2 # set any size
stride=2 # set any size
And fixed layer
[batchnorm]
Both are tested in training and detection.
Tested with random=1
@AlexeyAB Great, it is really helpful for me.
cspresnet50-elastic:
pooling type | top-1 | top-5 |
---|---|---|
maxpool | 76.8 | 93.5 |
local_avgpool | 54.9 | 78.4 |
i will stop training other models which using local_avgpool.
@WongKinYiu Did you use both [local_avgpool] and [batchnorm] ? You think bug in the [local_avgpool] or in the [batchnorm]? Can you show cfg-file?
@AlexeyAB Hello,
i only change all of [maxpool] to [local_avgpool] except for https://github.com/WongKinYiu/CrossStagePartialNetworks/blob/master/cfg/csresnet50-elastic.cfg#L40
@WongKinYiu Hey,
Would you mind explaining what local_avgpool does exactly? I couldn't see any reference to it in darknet's wiki page or in issues.
Is it just standart average pooling layer? For example, in " https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4-tiny_contrastive.cfg " it was experimented with. Yet I cannot understand what would be it's output size. It's input is 208x208x32, with size and stride equal to 2.
Netron also cannot understand it's output shape.
Need average pooling layer and batch normalization without convolutional layer
I am trying to apply the ELASTIC introduced in CVPR 2019 to improve the performance of YOLO v3. I need average pooling like darknet's max pooling, not global average pooling. I'm looking at Darknet's code to implement average pooling, but it's too hard for me.
//
I want batch normalize after the shortcut layer. But when I use the batchnorm layer after the shortcut, I can train but not test it with the following message.
Below is one block of my cfg file.
Sorry for my bad english. thanks for reading.