Islanna / DynamicReLU

Implementation of Dynamic ReLU on Pytorch
204 stars 38 forks source link

Using DynamicReLU in MobileNetV2, finding that it takes a large proportion of parameters compared with parameters of CNN. #5

Closed GarrettLee closed 4 years ago

GarrettLee commented 4 years ago

For example, for a block in MobileNetV2 shown below:

Conv1: 64 128 1 1 = 8192 Conv2: 128 1 3 3 = 1152 Conv3: 128 64 1 * 1 = 8192 SUM: 8192 + 1152 + 8192 = 17536

While adding DynamicReLU to this block (R=4, K=2):

Conv1: 64 128 1 * 1 = 8192 Dynamic FC1: 128 128 / R = 4096 Dynamic FC2: 128 / R 2 * K = 128* Conv2: 128 1 3 3 = 1152 Dynamic FC1: 128 128 / R = 4096 Dynamic FC2: 128 / R 2 * K = 128* Conv3: 128 64 1 1 = 8192 SUM: 8192 + 4096 + 128 + 1152 + 4096 + 128 + 8192 = 25984

The parameters number of dynamic Relu are 67% the size of Convolution layer.

Increasing R might help, but will that reduces the performance?

Islanna commented 4 years ago

Hi, sorry for the late response!

A significant increase of the number of params is a real downside of DyReLU. So, probably it's not the best choice for MobileNets. If you want to keep your model small, try to increase R and test performance on a small task. In my experiments R=8 worked just as great as R=4.

GarrettLee commented 4 years ago

I get it, Thanks!