ELASTIC dynamic scale policy +1% Top1

Donghyun-Son commented 5 years ago

Need average pooling layer and batch normalization without convolutional layer

I am trying to apply the ELASTIC introduced in CVPR 2019 to improve the performance of YOLO v3. I need average pooling like darknet's max pooling, not global average pooling. I'm looking at Darknet's code to implement average pooling, but it's too hard for me.

//

I want batch normalize after the shortcut layer. But when I use the batchnorm layer after the shortcut, I can train but not test it with the following message.

 seen 64
Done!
 TOP calculation...
CUDA status Error: file: ./src/dark_cuda.c : () : line: 317 : build time: Aug  6 2019 - 06:07:57
CUDA Error: an illegal memory access was encountered
CUDA Error: an illegal memory access was encountered: No such file or directory
darknet: ./src/utils.c:293: error: Assertion `0' failed.
Aborted (core dumped)

Below is one block of my cfg file.

[convolutional]
batch_normalize=1
filters=64
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=16
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=64
size=3
stride=1
pad=1
activation=linear

[route]
layers=-3

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=16
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=64
size=3
stride=1
pad=1
activation=linear

[upsample]
stride=2

[shortcut]
from=-6
activation=linear

[batchnorm]

[shortcut]
from=-10
activation=leaky

Sorry for my bad english. thanks for reading.

LukeAI commented 5 years ago

https://github.com/allenai/elastic https://arxiv.org/abs/1812.05262

Donghyun-Son commented 5 years ago

@LukeAI Oh..I forgot to talk about ELASTIC. Thank you for comment it. @AlexeyAB Can you help me?

AlexeyAB commented 5 years ago

ELASTIC can be applied on any network architecture, increases Accuracy, decreases FLOPS.

project: https://github.com/allenai/elastic
paper: https://arxiv.org/abs/1812.05262

Scale variation has been a challenge from traditional to modern approaches in computer vision. Most solutions to scale issues have a similar theme: a set of intuitive and manually designed policies that are generic and fixed (e.g. SIFT or feature pyramid). We argue that the scaling policy should be learned from data. In this paper, we introduce ELASTIC, a simple, efficient and yet very effective approach to learn a dynamic scale policy from data. We formulate the scaling policy as a non-linear function inside the network's structure that (a) is learned from data, (b) is instance specific, (c) does not add extra computation, and (d) can be applied on any network architecture. We applied ELASTIC to several state-of-the-art network architectures and showed consistent improvement without extra (sometimes even lower) computation on ImageNet classification, MSCOCO multi-label classification, and PASCAL VOC semantic segmentation. Our results show major improvement for images with scale challenges. Our code is available here: this https URL

AlexeyAB commented 5 years ago

@tbc01204 Hi,

Do we need only to implement ELASTIC

[avgpooling] stride=2 size=2
[batchnorm]

or something else?

Donghyun-Son commented 5 years ago

@AlexeyAB Hi, Thank you for your help. I guess that's enough.

The following code is a my elastic-yolov3 cfg file that can be used with current darknet. I just change the maxpool to avgpooling and add batch normalization.

[net]
[net]
# Testing
# batch=1
# subdivisions=1
# Training
batch=64
subdivisions=16

# yolo net
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

## single gpu
# learning_rate=0.001
# burn_in=1000
# max_batches = 500200

## 2 gpu
# learning_rate=0.0005
# burn_in=2000
# max_batches = 500200

## 4 gpu
learning_rate=0.00025
burn_in=4000
max_batches = 500200

policy=steps
steps=400000,450000
scales=.1,.1

# darknet53 net
# height=256
# width=256
# channels=3
# min_crop=128
# max_crop=448

# burn_in=1000
# learning_rate=0.1
# policy=poly
# steps=800000
# power=4
# max_batches=800000
# momentum=0.9
# decay=0.0005

# scales=.1,.1
# angle=7
# hue=.1
# saturation=.75
# exposure=.75
# aspect=.75

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

# Downsample

[convolutional]
batch_normalize=1
filters=64
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=16
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=64
size=3
stride=1
pad=1
activation=linear

# ELASTIC
[route]
layers=-3

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=16
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=64
size=3
stride=1
pad=1
activation=linear

[upsample]
stride=2

[shortcut]
from=-6
activation=linear

# Downsample

[convolutional]
batch_normalize=1
filters=128
size=3
stride=2
pad=1
activation=leaky
# x1
[convolutional]
batch_normalize=1
filters=32
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=128
size=3
stride=1
pad=1
activation=linear

# ELASTIC
[route]
layers=-3

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=32
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=128
size=3
stride=1
pad=1
activation=linear

[upsample]
stride=2

[shortcut]
from=-6
activation=linear

# x2
[convolutional]
batch_normalize=1
filters=32
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=128
size=3
stride=1
pad=1
activation=linear

# ELASTIC
[route]
layers=-3

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=32
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=128
size=3
stride=1
pad=1
activation=linear

[upsample]
stride=2

[shortcut]
from=-6
activation=linear

# Downsample

[convolutional]
batch_normalize=1
filters=256
size=3
stride=2
pad=1
activation=leaky
# x1
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=256
size=3
stride=1
pad=1
activation=linear

# ELASTIC
[route]
layers=-3

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=256
size=3
stride=1
pad=1
activation=linear

[upsample]
stride=2

[shortcut]
from=-6
activation=linear

# x2
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=256
size=3
stride=1
pad=1
activation=linear

# ELASTIC
[route]
layers=-3

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=256
size=3
stride=1
pad=1
activation=linear

[upsample]
stride=2

[shortcut]
from=-6
activation=linear

# x3
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=256
size=3
stride=1
pad=1
activation=linear

# ELASTIC
[route]
layers=-3

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=256
size=3
stride=1
pad=1
activation=linear

[upsample]
stride=2

[shortcut]
from=-6
activation=linear

# x4
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=256
size=3
stride=1
pad=1
activation=linear

# ELASTIC
[route]
layers=-3

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=256
size=3
stride=1
pad=1
activation=linear

[upsample]
stride=2

[shortcut]
from=-6
activation=linear

# x5
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=256
size=3
stride=1
pad=1
activation=linear

# ELASTIC
[route]
layers=-3

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=256
size=3
stride=1
pad=1
activation=linear

[upsample]
stride=2

[shortcut]
from=-6
activation=linear

# x6
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=256
size=3
stride=1
pad=1
activation=linear

# ELASTIC
[route]
layers=-3

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=256
size=3
stride=1
pad=1
activation=linear

[upsample]
stride=2

[shortcut]
from=-6
activation=linear

# x7
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=256
size=3
stride=1
pad=1
activation=linear

# ELASTIC
[route]
layers=-3

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=256
size=3
stride=1
pad=1
activation=linear

[upsample]
stride=2

[shortcut]
from=-6
activation=linear

# x8
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=256
size=3
stride=1
pad=1
activation=linear

# ELASTIC
[route]
layers=-3

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=256
size=3
stride=1
pad=1
activation=linear

[upsample]
stride=2

[shortcut]
from=-6
activation=linear

# Downsample
# x1
[convolutional]
batch_normalize=1
filters=512
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=512
size=3
stride=1
pad=1
activation=linear

# ELASTIC
[route]
layers=-3

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=512
size=3
stride=1
pad=1
activation=linear

[upsample]
stride=2

[shortcut]
from=-6
activation=linear

# x2
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=512
size=3
stride=1
pad=1
activation=linear

# ELASTIC
[route]
layers=-3

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=512
size=3
stride=1
pad=1
activation=linear

[upsample]
stride=2

[shortcut]
from=-6
activation=linear

# x3
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=512
size=3
stride=1
pad=1
activation=linear

# ELASTIC
[route]
layers=-3

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=512
size=3
stride=1
pad=1
activation=linear

[upsample]
stride=2

[shortcut]
from=-6
activation=linear

# x4
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=512
size=3
stride=1
pad=1
activation=linear

# ELASTIC
[route]
layers=-3

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=512
size=3
stride=1
pad=1
activation=linear

[upsample]
stride=2

[shortcut]
from=-6
activation=linear

# x5
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=512
size=3
stride=1
pad=1
activation=linear

# ELASTIC
[route]
layers=-3

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=512
size=3
stride=1
pad=1
activation=linear

[upsample]
stride=2

[shortcut]
from=-6
activation=linear

# x6
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=512
size=3
stride=1
pad=1
activation=linear

# ELASTIC
[route]
layers=-3

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=512
size=3
stride=1
pad=1
activation=linear

[upsample]
stride=2

[shortcut]
from=-6
activation=linear

# x7
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=512
size=3
stride=1
pad=1
activation=linear

# ELASTIC
[route]
layers=-3

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=512
size=3
stride=1
pad=1
activation=linear

[upsample]
stride=2

[shortcut]
from=-6
activation=linear

# x8
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=512
size=3
stride=1
pad=1
activation=linear

# ELASTIC
[route]
layers=-3

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
filters=512
size=3
stride=1
pad=1
activation=linear

[upsample]
stride=2

[shortcut]
from=-6
activation=linear

# Downsample

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

# classification

# [avgpool]

# [convolutional]
# batch_normalize=1
# filters=1000
# size=1
# stride=1
# pad=1
# activation=linear

# [softmax]
# groups=1

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=255
activation=linear

[yolo]
mask = 6,7,8
anchors = 13, 19,  32, 47,  47,114,  99, 55, 134,129,  80,226, 308,145, 175,285, 351,339
classes=80
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1

[route]
layers = -4

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[upsample]
stride=2

[route]
layers = -1, 156

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=255
activation=linear

[yolo]
mask = 3,4,5
anchors = 13, 19,  32, 47,  47,114,  99, 55, 134,129,  80,226, 308,145, 175,285, 351,339
classes=80
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1

[route]
layers = -4

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[upsample]
stride=2

[route]
layers = -1, 91

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=255
activation=linear

[yolo]
mask = 0,1,2
anchors = 13, 19,  32, 47,  47,114,  99, 55, 134,129,  80,226, 308,145, 175,285, 351,339
classes=80
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1

AlexeyAB commented 5 years ago

@tbc01204

I just change the maxpool to avgpooling and add batch normalization.

Do you mean, that you didn't change maxpool to avgpooling yet, but you want to change maxpool to avgpooling and add batch normalization?

Donghyun-Son commented 5 years ago

@tbc01204

I just change the maxpool to avgpooling and add batch normalization.

Do you mean, that you didn't change maxpool to avgpooling yet, but you want to change maxpool to avgpooling and add batch normalization?

Yes. I have to change them. Sorry for the confusion.

AlexeyAB commented 4 years ago

@WongKinYiu

Is it enough

[route]
layers = -1
group_id=0
groups=2

to implement Elastic models? https://github.com/WongKinYiu/CrossStagePartialNetworks/blob/master/cfg/csresnext50-elastic.cfg

Those. is there no need to implement?

[avgpooling] stride=2 size=2
[batchnorm]

WongKinYiu commented 4 years ago

@AlexeyAB

If [avgpooling] stride=2 size=2 and [batchnorm] can be implemented, it is much better.

AlexeyAB commented 4 years ago

@WongKinYiu

I added layer:

[local_avgpool]
size=2       # set any size
stride=2     # set any size

And fixed layer
```
[batchnorm]
```

Both are tested in training and detection. Tested with random=1

WongKinYiu commented 4 years ago

@AlexeyAB Great, it is really helpful for me.

WongKinYiu commented 4 years ago

cspresnet50-elastic:

pooling type	top-1	top-5
maxpool	76.8	93.5
local_avgpool	54.9	78.4

i will stop training other models which using local_avgpool.

AlexeyAB commented 4 years ago

@WongKinYiu Did you use both [local_avgpool] and [batchnorm] ? You think bug in the [local_avgpool] or in the [batchnorm]? Can you show cfg-file?

WongKinYiu commented 4 years ago

@AlexeyAB Hello,

i only change all of [maxpool] to [local_avgpool] except for https://github.com/WongKinYiu/CrossStagePartialNetworks/blob/master/cfg/csresnet50-elastic.cfg#L40

kadirbeytorun commented 3 years ago

@WongKinYiu Hey,

Would you mind explaining what local_avgpool does exactly? I couldn't see any reference to it in darknet's wiki page or in issues.

Is it just standart average pooling layer? For example, in " https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4-tiny_contrastive.cfg " it was experimented with. Yet I cannot understand what would be it's output size. It's input is 208x208x32, with size and stride equal to 2.

https://netron.app/?url=https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4-tiny_contrastive.cfg

Netron also cannot understand it's output shape.

AlexeyAB / darknet

ELASTIC dynamic scale policy +1% Top1 #3792