EfficientNet | Implementation ?

dexception commented 5 years ago

https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet https://ai.googleblog.com/2019/05/efficientnet-improving-accuracy-and.html https://www.youtube.com/watch?v=3svIm5UC94I

This is good.

nseidl commented 5 years ago

+1

AlexeyAB commented 5 years ago

Paper: https://arxiv.org/abs/1905.11946v2

Classifier

EfficientNet B0 (224x224) 0.9 BFLOPS - 0.45 B_FMA (16ms / RTX 2070), 4.9M params: efficientnet_b0.cfg.txt - Training 2.5 days
71.3% Top1 - 90.4% Top5 - accuracy weights file: https://drive.google.com/open?id=1nGdWz76A2EpNhWIfDeAI3hribboilux-

While (Official) EfficientNetB0 (224x224) 0.78 BFLOPS - 0.39 FMA, 5.3M params - that is trained by official code https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet with batch size equals to 256 has lower accuracy: 70.0% Top1 and 88.9% Top5

Detector - 3.7 BFLOPs, 45.0 mAP@0.5 on COCO test-dev.

cfg-file: enet-coco.cfg.txt
weights file: https://drive.google.com/open?id=1FlHeQjWEQVJt0ay1PVsiuuMzmtNyv36m

aaa

efficientnet-lite3-leaky.cfg: top-1 73.0%, top-5 92.4%. - change relu6 to leaky: activation=leaky https://github.com/AlexeyAB/darknet/blob/master/cfg/efficientnet-lite3.cfg

Classifiers: - Can be trained on ImageNet(ILSVRC2012) by using 4 x GPU 2080 TI:

EfficientNet B0 XNOR (224x224) 0.8 BFLOPS + 25 BOPS (18ms / RTX 2070): efficientnet_b0_xnor.cfg.txt - 5 days
EfficientNet B3 (288x288) 3.5 BFLOPS - 1.8 B_FMA (28ms/RTX 2070): efficientnet_b3.cfg.txt - 11 days
EfficientNet B3 (320x320) 4.3 BFLOPS - 2.2 B_FMA (30ms/RTX 2070): efficientnet_b3_320.cfg.txt - 14 days
EfficientNet B4 (384x384) 10.2 BFLOPS - 5.1 B_FMA (46ms/RTX 2070): efficientnet_b4.cfg.txt - 26 days

Training command: ./darknet classifier train cfg/imagenet1k_c.data cfg/efficientnet_b0.cfg -topk

Continue training: ./darknet classifier train cfg/imagenet1k_c.data cfg/efficientnet_b0.cfg backup/efficientnet_b0_last.weights -topk

Content of imagenet1k_c.data:

classes=1000
train  = data/imagenet1k.train_c.list
valid  = data/inet.val_c.list
backup = backup
labels = data/imagenet.labels.list
names  = data/imagenet.shortnames.list
top=5

Dataset - each image in imagenet1k.train_c.list and inet.val_c.list has one of 1000 labels from imagenet.labels.list, for example n01440764

imagenet.labels.list: https://github.com/AlexeyAB/darknet/blob/master/data/imagenet.labels.list
imagenet.shortnames.list: https://github.com/AlexeyAB/darknet/blob/master/data/imagenet.shortnames.list
ILSVRC2012 training dataset - annotated images - 138 GB: https://github.com/AlexeyAB/darknet/blob/master/scripts/get_imagenet_train.sh
ILSVRC2012 validation dataset:
- images - 6.3 GB: http://www.image-net.org/challenges/LSVRC/2012/nnoupb/ILSVRC2012_img_val.tar
- annotations - 2.2 MB: http://www.image-net.org/challenges/LSVRC/2012/nnoupb/ILSVRC2012_bbox_val_v3.tgz Set validation labels: https://github.com/AlexeyAB/darknet/blob/master/scripts/imagenet_label.sh read: https://pjreddie.com/darknet/imagenet/

More: http://www.image-net.org/challenges/LSVRC/2012/nonpub-downloads

Models: https://github.com/tensorflow/tpu/blob/05f7b15cdf0ae36bac84beb4aef0a09983ce8f66/models/official/efficientnet/efficientnet_builder.py#L28-L39

      # (width_coefficient, depth_coefficient, resolution, dropout_rate)
      'efficientnet-b0': (1.0, 1.0, 224, 0.2),
      'efficientnet-b1': (1.0, 1.1, 240, 0.2),
      'efficientnet-b2': (1.1, 1.2, 260, 0.3),
      'efficientnet-b3': (1.2, 1.4, 300, 0.3),
      'efficientnet-b4': (1.4, 1.8, 380, 0.4),
      'efficientnet-b5': (1.6, 2.2, 456, 0.4),
      'efficientnet-b6': (1.8, 2.6, 528, 0.5),
      'efficientnet-b7': (2.0, 3.1, 600, 0.5),

CLICK ME - EfficientNet B0 model details

```cpp #alpha=1.2, beta=1.1, gamma=1.15 #d=pow(alpha, fi), w=pow(beta, fi), r=pow(gamma, fi) #fi=0, d=1.0, w=1.0, r=1.0 - theoretically # in practice: https://github.com/tensorflow/tpu/blob/05f7b15cdf0ae36bac84beb4aef0a09983ce8f66/models/official/efficientnet/efficientnet_builder.py#L28-L40 # 'efficientnet-b0': (1.0, 1.0, 224, 0.2): # width=1.0, depth=1.0, resolution=224, dropout=0.2 BLOCKS 1 - 7: 'r1_k3_s11_e1_i32_o16_se0.25', 'r2_k3_s22_e6_i16_o24_se0.25', 'r2_k5_s22_e6_i24_o40_se0.25', 'r3_k3_s22_e6_i40_o80_se0.25', 'r3_k5_s11_e6_i80_o112_se0.25', 'r4_k5_s22_e6_i112_o192_se0.25', 'r1_k3_s11_e6_i192_o320_se0.25', In details: https://github.com/tensorflow/tpu/blob/05f7b15cdf0ae36bac84beb4aef0a09983ce8f66/models/official/efficientnet/efficientnet_builder.py#L61-L69 BLOCK-1 # r1_k3_s11_e1_i32_o16_se0.25 # return efficientnet_model.BlockArgs( # kernel_size=int(options['k']), kernel_size = 3 # num_repeat=int(options['r']), num_repeat = 1 # input_filters=int(options['i']), input_filters = 32 # output_filters=int(options['o']), output_filters = 16 # expand_ratio=int(options['e']), expand_ratio = 1 # id_skip=('noskip' not in block_string), # se_ratio=float(options['se']) if 'se' in options else None, se_ratio = 0.25 # strides=[int(options['s'][0]), int(options['s'][1])]) strides = 1,1 BLOCK-2 # r2_k3_s22_e6_i16_o24_se0.25 # return efficientnet_model.BlockArgs( # kernel_size=int(options['k']), kernel_size = 3 # num_repeat=int(options['r']), num_repeat = 2 # input_filters=int(options['i']), input_filters = 16 # output_filters=int(options['o']), output_filters = 24 # expand_ratio=int(options['e']), expand_ratio = 6 # id_skip=('noskip' not in block_string), # se_ratio=float(options['se']) if 'se' in options else None, se_ratio = 0.25 # strides=[int(options['s'][0]), int(options['s'][1])]) strides = 2,2 BLOCK-3 # r2_k5_s22_e6_i24_o40_se0.25 # return efficientnet_model.BlockArgs( # kernel_size=int(options['k']), kernel_size = 5 # num_repeat=int(options['r']), num_repeat = 2 # input_filters=int(options['i']), input_filters = 24 # output_filters=int(options['o']), output_filters = 40 # expand_ratio=int(options['e']), expand_ratio = 6 # id_skip=('noskip' not in block_string), # se_ratio=float(options['se']) if 'se' in options else None, se_ratio = 0.25 # strides=[int(options['s'][0]), int(options['s'][1])]) strides = 2,2 BLOCK-4 # r3_k3_s22_e6_i40_o80_se0.25 # return efficientnet_model.BlockArgs( # kernel_size=int(options['k']), kernel_size = 3 # num_repeat=int(options['r']), num_repeat = 3 # input_filters=int(options['i']), input_filters = 40 # output_filters=int(options['o']), output_filters = 80 # expand_ratio=int(options['e']), expand_ratio = 6 # id_skip=('noskip' not in block_string), # se_ratio=float(options['se']) if 'se' in options else None, se_ratio = 0.25 # strides=[int(options['s'][0]), int(options['s'][1])]) strides = 2,2 BLOCK-5 # r3_k5_s11_e6_i80_o112_se0.25 # return efficientnet_model.BlockArgs( # kernel_size=int(options['k']), kernel_size = 5 # num_repeat=int(options['r']), num_repeat = 3 # input_filters=int(options['i']), input_filters = 80 # output_filters=int(options['o']), output_filters = 112 # expand_ratio=int(options['e']), expand_ratio = 6 # id_skip=('noskip' not in block_string), # se_ratio=float(options['se']) if 'se' in options else None, se_ratio = 0.25 # strides=[int(options['s'][0]), int(options['s'][1])]) strides = 1,1 BLOCK-6 # r4_k5_s22_e6_i112_o192_se0.25 # return efficientnet_model.BlockArgs( # kernel_size=int(options['k']), kernel_size = 5 # num_repeat=int(options['r']), num_repeat = 4 # input_filters=int(options['i']), input_filters = 112 # output_filters=int(options['o']), output_filters = 192 # expand_ratio=int(options['e']), expand_ratio = 6 # id_skip=('noskip' not in block_string), # se_ratio=float(options['se']) if 'se' in options else None, se_ratio = 0.25 # strides=[int(options['s'][0]), int(options['s'][1])]) strides = 2,2 BLOCK-7 # r1_k3_s11_e6_i192_o320_se0.25 # return efficientnet_model.BlockArgs( # kernel_size=int(options['k']), kernel_size = 3 # num_repeat=int(options['r']), num_repeat = 1 # input_filters=int(options['i']), input_filters = 192 # output_filters=int(options['o']), output_filters = 320 # expand_ratio=int(options['e']), expand_ratio = 6 # id_skip=('noskip' not in block_string), # se_ratio=float(options['se']) if 'se' in options else None, se_ratio = 0.25 # strides=[int(options['s'][0]), int(options['s'][1])]) strides = 1,1 ``` ![efficientnet_b0_ext](https://user-images.githubusercontent.com/4096485/60291516-b28cfc80-9923-11e9-998a-9ad764889cb0.png)

CLICK ME - EfficientNet B3 model details

```cpp #alpha=1.2, beta=1.1, gamma=1.15 #d=pow(alpha, fi), w=pow(beta, fi), r=pow(gamma, fi) #fi=3, d=1.73, w=1.33, r=1.52 - theoretically # in practice: https://github.com/tensorflow/tpu/blob/05f7b15cdf0ae36bac84beb4aef0a09983ce8f66/models/official/efficientnet/efficientnet_builder.py#L28-L40 # 'efficientnet-b3': (1.2, 1.4, 300, 0.3): # width=1.2, depth=1.4, resolution=300 (320), dropout=0.3 # https://github.com/tensorflow/tpu/blob/05f7b15cdf0ae36bac84beb4aef0a09983ce8f66/models/official/efficientnet/efficientnet_model.py#L120-L125 # repeats_new = int(math.ceil(depth * repeats)) ### ceil - Rounds x upward, # https://github.com/tensorflow/tpu/blob/05f7b15cdf0ae36bac84beb4aef0a09983ce8f66/models/official/efficientnet/efficientnet_builder.py#L134-L137 # width_coefficient=width_coefficient, # depth_coefficient=depth_coefficient, # depth_divisor=8, # min_depth=None) # # https://github.com/tensorflow/tpu/blob/05f7b15cdf0ae36bac84beb4aef0a09983ce8f66/models/official/efficientnet/efficientnet_model.py#L101-L117 # multiplier = width_coefficient = 1.2 # divisor = 8 # min_depth = none # min_depth = divisor = 8 filters = filters * 1.2 new_filters = max(8, (int(filters + 4) // 8) * 8) ## //===floor in this case if new_filters < 0.9 * filters: new_filters += 8 16 *1.2=19,2 new_filters = max(8, int(19,2+4)//8 * 8) = 16 (>=16) 24 *1.2=28,8 new_filters = max(8, int(28,8+4)//8 * 8) = 32 (>24) 32 *1.2=38,4 new_filters = max(8, int(38,4+4)//8 * 8) = 40 (>32) 40 *1.2=48 new_filters = max(8, int(48+4)//8 * 8) = 48 (>40) 80 *1.2=96 new_filters = max(8, int(96+4)//8 * 8) = 96 (>80) 112 *1.2=134,4 new_filters = max(8, int(134,4+4)//8 * 8) = 136 (>112) 192 *1.2=230,4 new_filters = max(8, int(230,4+4)//8 * 8) = 232 (>192) 320 *1.2=384 new_filters = max(8, int(384+4)//8 * 8) = 384 (>320) 8 *1.2=9,6 new_filters = max(8, int(9,6+4)//8 * 8) = 8 (==8) 64 *1.2=76,8 new_filters = max(8, int(76,8+4)//8 * 8) = 80 (>64) 96 *1.2=115,2 new_filters = max(8, int(115,2+4)//8 * 8) = 112 (>96) 144 *1.2=172,8 new_filters = max(8, int(172,8+4)//8 * 8) = 176 (>144) 384 *1.2=460,8 new_filters = max(8, int(460,8+4)//8 * 8) = 464 (>384) 576 *1.2=691,2 new_filters = max(8, int(691,2+4)//8 * 8) = 688 (>576) 960 *1.2=1152 new_filters = max(8, int(1152+4)//8 * 8) = 1152 (>960) 1280 *1.2=1536 new_filters = max(8, int(1536+4)//8 * 8) = 1536 (>1280) BLOCKS 1 - 7: (for b0) 'r2_k3_s11_e1_i32_o16_se0.25', 'r4_k3_s22_e6_i16_o24_se0.25', 'r4_k5_s22_e6_i24_o40_se0.25', 'r6_k3_s22_e6_i40_o80_se0.25', 'r6_k5_s11_e6_i80_o112_se0.25', 'r8_k5_s22_e6_i112_o192_se0.25', 'r2_k3_s11_e6_i192_o320_se0.25', In details: https://github.com/tensorflow/tpu/blob/05f7b15cdf0ae36bac84beb4aef0a09983ce8f66/models/official/efficientnet/efficientnet_builder.py#L61-L69 BLOCK-1 # r1_k3_s11_e1_i32_o16_se0.25 # return efficientnet_model.BlockArgs( # kernel_size=int(options['k']), kernel_size = 3 # num_repeat=int(options['r']), num_repeat = 2 //1 # input_filters=int(options['i']), input_filters = 40 //32 # output_filters=int(options['o']), output_filters = 16 //16 # expand_ratio=int(options['e']), expand_ratio = 1 # id_skip=('noskip' not in block_string), # se_ratio=float(options['se']) if 'se' in options else None, se_ratio = 0.25 # strides=[int(options['s'][0]), int(options['s'][1])]) strides = 1,1 BLOCK-2 # r2_k3_s22_e6_i16_o24_se0.25 # return efficientnet_model.BlockArgs( # kernel_size=int(options['k']), kernel_size = 3 # num_repeat=int(options['r']), num_repeat = 3 //2 # input_filters=int(options['i']), input_filters = 16 //16 # output_filters=int(options['o']), output_filters = 32 //24 # expand_ratio=int(options['e']), expand_ratio = 6 # id_skip=('noskip' not in block_string), # se_ratio=float(options['se']) if 'se' in options else None, se_ratio = 0.25 # strides=[int(options['s'][0]), int(options['s'][1])]) strides = 2,2 BLOCK-3 # r2_k5_s22_e6_i24_o40_se0.25 # return efficientnet_model.BlockArgs( # kernel_size=int(options['k']), kernel_size = 5 # num_repeat=int(options['r']), num_repeat = 3 //2 # input_filters=int(options['i']), input_filters = 32 //24 # output_filters=int(options['o']), output_filters = 48 //40 # expand_ratio=int(options['e']), expand_ratio = 6 # id_skip=('noskip' not in block_string), # se_ratio=float(options['se']) if 'se' in options else None, se_ratio = 0.25 # strides=[int(options['s'][0]), int(options['s'][1])]) strides = 2,2 BLOCK-4 # r3_k3_s22_e6_i40_o80_se0.25 # return efficientnet_model.BlockArgs( # kernel_size=int(options['k']), kernel_size = 3 # num_repeat=int(options['r']), num_repeat = 5 //3 # input_filters=int(options['i']), input_filters = 48 //40 # output_filters=int(options['o']), output_filters = 96 //80 # expand_ratio=int(options['e']), expand_ratio = 6 # id_skip=('noskip' not in block_string), # se_ratio=float(options['se']) if 'se' in options else None, se_ratio = 0.25 # strides=[int(options['s'][0]), int(options['s'][1])]) strides = 2,2 BLOCK-5 # r3_k5_s11_e6_i80_o112_se0.25 # return efficientnet_model.BlockArgs( # kernel_size=int(options['k']), kernel_size = 5 # num_repeat=int(options['r']), num_repeat = 5 //3 # input_filters=int(options['i']), input_filters = 96 //80 # output_filters=int(options['o']), output_filters = 136 //112 # expand_ratio=int(options['e']), expand_ratio = 6 # id_skip=('noskip' not in block_string), # se_ratio=float(options['se']) if 'se' in options else None, se_ratio = 0.25 # strides=[int(options['s'][0]), int(options['s'][1])]) strides = 1,1 BLOCK-6 # r4_k5_s22_e6_i112_o192_se0.25 # return efficientnet_model.BlockArgs( # kernel_size=int(options['k']), kernel_size = 5 # num_repeat=int(options['r']), num_repeat = 6 //4 # input_filters=int(options['i']), input_filters = 136 //112 # output_filters=int(options['o']), output_filters = 232 //192 # expand_ratio=int(options['e']), expand_ratio = 6 # id_skip=('noskip' not in block_string), # se_ratio=float(options['se']) if 'se' in options else None, se_ratio = 0.25 # strides=[int(options['s'][0]), int(options['s'][1])]) strides = 2,2 BLOCK-7 # r1_k3_s11_e6_i192_o320_se0.25 # return efficientnet_model.BlockArgs( # kernel_size=int(options['k']), kernel_size = 3 # num_repeat=int(options['r']), num_repeat = 2 //1 # input_filters=int(options['i']), input_filters = 232 //192 # output_filters=int(options['o']), output_filters = 384 //320 # expand_ratio=int(options['e']), expand_ratio = 6 # id_skip=('noskip' not in block_string), # se_ratio=float(options['se']) if 'se' in options else None, se_ratio = 0.25 # strides=[int(options['s'][0]), int(options['s'][1])]) strides = 1,1 ``` ![efficientnet_b3](https://user-images.githubusercontent.com/4096485/60367923-32849680-99f8-11e9-8d75-4051c37dde92.png)

CLICK ME - EfficientNet B4 model details

```cpp #alpha=1.2, beta=1.1, gamma=1.15 #d=pow(alpha, fi), w=pow(beta, fi), r=pow(gamma, fi) #fi=4, d=2.07, w=1.46, r=1.75 - theoretically # in practice: https://github.com/tensorflow/tpu/blob/05f7b15cdf0ae36bac84beb4aef0a09983ce8f66/models/official/efficientnet/efficientnet_builder.py#L28-L40 # efficientnet-b4': (1.4, 1.8, 380, 0.4): # width=1.4, depth=1.8, resolution=380, dropout=0.4 # https://github.com/tensorflow/tpu/blob/05f7b15cdf0ae36bac84beb4aef0a09983ce8f66/models/official/efficientnet/efficientnet_model.py#L120-L125 # repeats_new = int(math.ceil(depth * repeats)) ### ceil - Rounds x upward, # https://github.com/tensorflow/tpu/blob/05f7b15cdf0ae36bac84beb4aef0a09983ce8f66/models/official/efficientnet/efficientnet_builder.py#L134-L137 # width_coefficient=width_coefficient, # depth_coefficient=depth_coefficient, # depth_divisor=8, # min_depth=None) # # https://github.com/tensorflow/tpu/blob/05f7b15cdf0ae36bac84beb4aef0a09983ce8f66/models/official/efficientnet/efficientnet_model.py#L101-L117 # multiplier = width_coefficient = 1.4 # divisor = 8 # min_depth = none # min_depth = divisor = 8 filters = filters * 1.4 new_filters = max(8, (int(filters + 4) // 8) * 8) ## //===floor in this case if new_filters < 0.9 * filters: new_filters += 8 16 *1.4=22.4 new_filters = max(8, int(22.4+4)//8 * 8) = 24 (>16) 24 *1.4=33.6 new_filters = max(8, int(33.6+4)//8 * 8) = 32 (>24) 32 *1.4=44.8 new_filters = max(8, int(44.8+4)//8 * 8) = 48 (>32) 40 *1.4=56 new_filters = max(8, int(56+4)//8 * 8) = 56 (>40) 80 *1.4=112 new_filters = max(8, int(112+4)//8 * 8) = 112 (>80) 112 *1.4=156,8 new_filters = max(8, int(156,8+4)//8 * 8) = 160 (>112) 192 *1.4=268,8 new_filters = max(8, int(268,8+4)//8 * 8) = 272 (>192) 320 *1.4=448 new_filters = max(8, int(448+4)//8 * 8) = 448 (>320) 8 *1.4=11,2 new_filters = max(8, int(11,2+4)//8 * 8) = 8 (==8) 64 *1.4=89,6 new_filters = max(8, int(89,6+4)//8 * 8) = 88 (>64) 96 *1.4=134,4 new_filters = max(8, int(134,4+4)//8 * 8) = 136 (>96) 144 *1.4=201,6 new_filters = max(8, int(201,6+4)//8 * 8) = 200 (>144) 384 *1.4=537,6 new_filters = max(8, int(537,6+4)//8 * 8) = 536 (>384) 576 *1.4=806,4 new_filters = max(8, int(806,4+4)//8 * 8) = 808 (>576) 960 *1.4=1344 new_filters = max(8, int(1344+4)//8 * 8) = 1344 (>960) 1280 *1.4=1792 new_filters = max(8, int(1792+4)//8 * 8) = 1792 (>1280) BLOCKS 1 - 7: 'r2_k3_s11_e1_i32_o16_se0.25', 'r4_k3_s22_e6_i16_o24_se0.25', 'r4_k5_s22_e6_i24_o40_se0.25', 'r6_k3_s22_e6_i40_o80_se0.25', 'r6_k5_s11_e6_i80_o112_se0.25', 'r8_k5_s22_e6_i112_o192_se0.25', 'r2_k3_s11_e6_i192_o320_se0.25', In details: https://github.com/tensorflow/tpu/blob/05f7b15cdf0ae36bac84beb4aef0a09983ce8f66/models/official/efficientnet/efficientnet_builder.py#L61-L69 BLOCK-1 # r1_k3_s11_e1_i32_o16_se0.25 # return efficientnet_model.BlockArgs( # kernel_size=int(options['k']), kernel_size = 3 # num_repeat=int(options['r']), num_repeat = 2 //1 # input_filters=int(options['i']), input_filters = 48 //32 # output_filters=int(options['o']), output_filters = 24 //16 # expand_ratio=int(options['e']), expand_ratio = 1 # id_skip=('noskip' not in block_string), # se_ratio=float(options['se']) if 'se' in options else None, se_ratio = 0.25 # strides=[int(options['s'][0]), int(options['s'][1])]) strides = 1,1 BLOCK-2 # r2_k3_s22_e6_i16_o24_se0.25 # return efficientnet_model.BlockArgs( # kernel_size=int(options['k']), kernel_size = 3 # num_repeat=int(options['r']), num_repeat = 4 //2 # input_filters=int(options['i']), input_filters = 24 //16 # output_filters=int(options['o']), output_filters = 32 //24 # expand_ratio=int(options['e']), expand_ratio = 6 # id_skip=('noskip' not in block_string), # se_ratio=float(options['se']) if 'se' in options else None, se_ratio = 0.25 # strides=[int(options['s'][0]), int(options['s'][1])]) strides = 2,2 BLOCK-3 # r2_k5_s22_e6_i24_o40_se0.25 # return efficientnet_model.BlockArgs( # kernel_size=int(options['k']), kernel_size = 5 # num_repeat=int(options['r']), num_repeat = 4 //2 # input_filters=int(options['i']), input_filters = 32 //24 # output_filters=int(options['o']), output_filters = 56 //40 # expand_ratio=int(options['e']), expand_ratio = 6 # id_skip=('noskip' not in block_string), # se_ratio=float(options['se']) if 'se' in options else None, se_ratio = 0.25 # strides=[int(options['s'][0]), int(options['s'][1])]) strides = 2,2 BLOCK-4 # r3_k3_s22_e6_i40_o80_se0.25 # return efficientnet_model.BlockArgs( # kernel_size=int(options['k']), kernel_size = 3 # num_repeat=int(options['r']), num_repeat = 6 //3 # input_filters=int(options['i']), input_filters = 56 //40 # output_filters=int(options['o']), output_filters = 112 //80 # expand_ratio=int(options['e']), expand_ratio = 6 # id_skip=('noskip' not in block_string), # se_ratio=float(options['se']) if 'se' in options else None, se_ratio = 0.25 # strides=[int(options['s'][0]), int(options['s'][1])]) strides = 2,2 BLOCK-5 # r3_k5_s11_e6_i80_o112_se0.25 # return efficientnet_model.BlockArgs( # kernel_size=int(options['k']), kernel_size = 5 # num_repeat=int(options['r']), num_repeat = 6 //3 # input_filters=int(options['i']), input_filters = 112 //80 # output_filters=int(options['o']), output_filters = 160 //112 # expand_ratio=int(options['e']), expand_ratio = 6 # id_skip=('noskip' not in block_string), # se_ratio=float(options['se']) if 'se' in options else None, se_ratio = 0.25 # strides=[int(options['s'][0]), int(options['s'][1])]) strides = 1,1 BLOCK-6 # r4_k5_s22_e6_i112_o192_se0.25 # return efficientnet_model.BlockArgs( # kernel_size=int(options['k']), kernel_size = 5 # num_repeat=int(options['r']), num_repeat = 8 //4 # input_filters=int(options['i']), input_filters = 160 //112 # output_filters=int(options['o']), output_filters = 272 //192 # expand_ratio=int(options['e']), expand_ratio = 6 # id_skip=('noskip' not in block_string), # se_ratio=float(options['se']) if 'se' in options else None, se_ratio = 0.25 # strides=[int(options['s'][0]), int(options['s'][1])]) strides = 2,2 BLOCK-7 # r1_k3_s11_e6_i192_o320_se0.25 # return efficientnet_model.BlockArgs( # kernel_size=int(options['k']), kernel_size = 3 # num_repeat=int(options['r']), num_repeat = 2 //1 # input_filters=int(options['i']), input_filters = 272 //192 # output_filters=int(options['o']), output_filters = 448 //320 # expand_ratio=int(options['e']), expand_ratio = 6 # id_skip=('noskip' not in block_string), # se_ratio=float(options['se']) if 'se' in options else None, se_ratio = 0.25 # strides=[int(options['s'][0]), int(options['s'][1])]) strides = 1,1 ``` ![efficientnet_b4_ext](https://user-images.githubusercontent.com/4096485/60291492-a3a64a00-9923-11e9-8cdc-c7fb2f7388ba.png)

In other words, to scale up the CNN, the depth of layers should increase 20%, the width 10% and the image resolution 15% to keep things as efficient as possible while expanding the implementation and improving the CNN accuracy.

The MBConv block is nothing fancy but an Inverted Residual Block (used in MobileNetV2) with a Squeeze and Excite block injected sometimes.

MobileNetV2: Inverted Residuals and Linear Bottlenecks: https://arxiv.org/pdf/1801.04381v4.pdf
MobileNetV2 graph: http://ethereon.github.io/netscope/#/gist/d01b5b8783b4582a42fe07bd46243986
MobileNetV2 proto: https://github.com/shicai/MobileNet-Caffe/blob/master/mobilenet_v2_deploy.prototxt
MobileNetv2 Darknet-cfg: https://github.com/WePCf/darknet-mobilenet-v2/blob/master/mobilenet/test.cfg (should be trained from the begining, since the src/image.c and examples/classifier.c are modified in WePCf-repo, search for "mobilenet" to see what are changed)

https://towardsdatascience.com/mobilenetv2-inverted-residuals-and-linear-bottlenecks-8a4362f4ffd5

MobileNet_v2:

EfficientNet_b0:

1_OpvSpqMP61IO_9cp4mAXnA

1_BvAqynrNCq5RjMesSPvPgg

AlexeyAB commented 5 years ago

EfficientNet_b0: efficientnet_b0.cfg.txt - Accuracy: Top1 = 57.6%, Top5 = 81.2% - 150 000 iterations (something goes wrong)

efficientnet_b0_ext

dexception commented 5 years ago

Would like to share this link.

https://pypi.org/project/gluoncv2/

Interesting to see the imagenet-1k comparison chart.

Model | Top 1 Error | Top 5 Error | Params | Flops DarkNet-53 | 21.41 | 5.56 | 41,609,928 | 7,133.86M EfficientNet-B0b | 23.41 | 6.95 | 5,288,548 | 414.31M

With the difference of 2% in top 1 error with number of parameters are 1/8 and 1/17 less flops. Would love to see the inference time and accuracy as object detection.

Also a tiny version wouldn't be bad after all. This is like running yolov3-tiny with yolov3 accuracy.

AlexeyAB commented 5 years ago

@dexception Have you ever seen a graphic representation of EfficientNet b1 - b7 models (other than b0), or their exact text description, like Caffe proto-files?

AlexeyAB commented 5 years ago

EfficientNet_b4: efficientnet_b4.cfg.txt

dexception commented 5 years ago

@AlexeyAB

Keras, Pytorch and Mxnet implementation is definitely there: https://github.com/qubvel/efficientnet https://github.com/lukemelas/EfficientNet-PyTorch https://github.com/titu1994/keras-efficientnets https://github.com/zsef123/EfficientNets-PyTorch https://github.com/DableUTeeF/keras-efficientnet https://github.com/qubvel/efficientnet https://github.com/mnikitin/EfficientNet/blob/master/efficientnet_model.py

The code and research paper is different. But the code is correct. https://github.com/tensorflow/tpu/issues/383

I don't think there is any caffe implementation as of yet.

WongKinYiu commented 5 years ago

Hello, I draw the model from Keras implementation: https://github.com/qubvel/efficientnet . Here are b0 and b1.

CLICK ME - EfficientNet B0 and B1 model diagrams

![EfficientNetB0](https://user-images.githubusercontent.com/12152972/59659841-ad200b80-91d9-11e9-946e-6f625055ab5b.png) ![EfficientNetB1](https://user-images.githubusercontent.com/12152972/59660179-73033980-91da-11e9-8b41-85bf9c8acc47.png)

I use the code: `from efficientnet import EfficientNetB1 from keras.utils import plot_model

model = EfficientNetB1() plot_model(model, to_file='EfficientNetB1.png')`

WongKinYiu commented 5 years ago

EfficientNet_b0: efficientnet_b0.cfg.txt - Accuracy: Top1 = 19.3%, Top5 = 40.6% (something goes wrong)

Maybe squeeze and excitation blocks are missing?

AlexeyAB commented 5 years ago

@WongKinYiu Thanks!

Can you also add model diagram for B4?

Maybe squeeze and excitation blocks are missing?

I think yes, there should be:

squeeze and excitation blocks https://towardsdatascience.com/squeeze-and-excitation-networks-9ef5e71eacd7 and https://arxiv.org/abs/1709.01507v4 and https://github.com/hujie-frank/SENet
dropout
batch=120 and should be trained at least 1 000 000 - 1 600 000 iterations

AlexeyAB commented 5 years ago

@dexception Thanks!

WongKinYiu commented 5 years ago

Model diagram for EfficientNets.

CLICK ME - EfficientNet B0 model diagram

![EfficientNetB0](https://user-images.githubusercontent.com/12152972/59659841-ad200b80-91d9-11e9-946e-6f625055ab5b.png)

CLICK ME - EfficientNet B1 model diagram

![EfficientNetB1](https://user-images.githubusercontent.com/12152972/59660179-73033980-91da-11e9-8b41-85bf9c8acc47.png)

CLICK ME - EfficientNet B2 model diagram

![EfficientNetB2](https://user-images.githubusercontent.com/12152972/59730393-c9bd5180-9274-11e9-8b77-53ec014608cb.png)

CLICK ME - EfficientNet B3 model diagram

![EfficientNetB3](https://user-images.githubusercontent.com/12152972/59730385-c1fdad00-9274-11e9-91e3-e24936a975ed.png)

CLICK ME - EfficientNet B4 model diagram

![EfficientNetB4](https://user-images.githubusercontent.com/12152972/59724088-480cfa00-925b-11e9-9193-d6968a69b717.png)

CLICK ME - EfficientNet B5 model diagram

![EfficientNetB5](https://user-images.githubusercontent.com/12152972/59730380-be6a2600-9274-11e9-82d9-1ab45a7488b3.png)

CLICK ME - EfficientNet B6 model diagram

![EfficientNetB6](https://user-images.githubusercontent.com/12152972/59730374-b9a57200-9274-11e9-83eb-1f8a434f33e2.png)

CLICK ME - EfficientNet B7 model diagram

![EfficientNetB7](https://user-images.githubusercontent.com/12152972/59730365-aeeadd00-9274-11e9-85e1-f27c171196e6.png)

AlexeyAB commented 5 years ago

@WongKinYiu Thanks!

It seems now it looks like your diagram: efficientnet_b0.cfg.txt

top1 = 69.49%
top5 = 89.44%

Should be used: should be trained at least 1.6 M iterations with learning_rate=0.256 policy=step scale=0.97 step=10000 (initial learning rate 0.256 that decays by 0.97 every 2.4 epochs) to achieve Top1 = 76.3%, Top5 = 93.2%

Trained weights-file, 500 000 iterations with batch=120: https://drive.google.com/open?id=1MvX0skcmg87T_jn8kDf2Oc6raIb56xq9

chart

Just

I use [dropout] instead of DropConnect

On your diagrams Lambda is a [avgpool].

MBConv blocks includes:

Squeeze-and-Excitation blocks (layers: [avgpool]->[conv]->[conv]->[scale_channels])
and [dropout]-layer before each [shortcut]-residual layer as it is done here: https://github.com/qubvel/efficientnet/blob/master/efficientnet/model.py
Swish activations

efficientnet_b0_ext

WongKinYiu commented 5 years ago

@AlexeyAB Good job! And thank you for sharing the cfg file.

I will also implement SNet of ThunderNet as backbone to compare with EfficientNet.

AlexeyAB commented 5 years ago

@WongKinYiu Yes, this is interesting that SNet+ThunderNet achieved the same accuracy 78.6% mAP@0.5 as Yolo v2, but by using 2-stage-detector with 24 FPS on ARM CPU: https://paperswithcode.com/sota/object-detection-on-pascal-voc-2007

WongKinYiu commented 5 years ago

@AlexeyAB I also want to implement CEM (Context Enhancement Module) and SAM (Spatial Attention Module) of ThunderNet.

CEM + YOLOv3 got 41.2% mAP@0.5 with 2.85 BFLOPs. CEM + SAM + YOLOv3 got 42.0% mAP@0.5 with 2.90 BFLOPs.

CEM:

SAM:

Results:

LukeAI commented 5 years ago

I'd be interested in running a trial with efficientnet and sharing the results - do you have a B6 or B7 version of the model? Do I use it in the same way as I would with any of the other cfg files? No need to manually calculate anchors and enter classes in the cfg?

LukeAI commented 5 years ago

Oh I see - efficientnet is a full Object Detector? But maybe the B7 model with a Yolo head... ?

dexception commented 5 years ago

@LukeAI This is imagenet classification.

LukeAI commented 5 years ago

Ok so I realise that this is image classification - I have an image classification problem with 7 classes - if necessary I could resize all my images to 32x32 - how could I train/test on my dataset with the .cfg ?

WongKinYiu commented 5 years ago

@LukeAI https://pjreddie.com/darknet/train-cifar/

mdv3101 commented 5 years ago

@AlexeyAB Nice work on EfficientNet. If implemented successfully this would give the fastest training and inference time among all implementations.

dexception commented 5 years ago

@AlexeyAB Since we are already discussing the newer models here https://github.com/AlexeyAB/darknet/issues/3114

This issue should be merged with this. Because eventually we will have yolo-head with EfficientNet once the niggles are sorted out.

ChenCong7375 commented 5 years ago

Will Swish be implemented in darknet soon? which is based on RELU/RELU6?

beHappy666 commented 5 years ago

Do you have scale_channels layer implement?3q

dexception commented 5 years ago

@WongKinYiu Thanks!

It seems now it looks like your diagram: efficientnet_b0.cfg.txt

top1 = 68.04%

top5 = 88.59%

Should be used: should be used Swish instead of leaky-ReLU, should be trained at least 1M iterations with learning_rate=0.256 policy=step scale=0.97 step=10000 (initial learning rate 0.256 that decays by 0.97 every 2.4 epochs)

Trained weights-file, 378 000 iterations with batch=120: https://drive.google.com/open?id=1PWbM3en8mOqIbe9kIrEY-ljvvcmTR5AK

Just

I use [dropout] instead of DropConnect

I use activation=leaky-relu (slope=0.1) instead of Swish

On your diagrams Lambda is a [avgpool].

MBConv blocks includes:

Squeeze-and-Excitation blocks (layers: [avgpool]->[conv]->[conv]->[scale_channels])

and [dropout]-layer before each [shortcut]-residual layer as it is done here: https://github.com/qubvel/efficientnet/blob/master/efficientnet/model.py

@AlexeyAB

Can you share other cfg files for EfficientNet ? I would like to give it a try.

AlexeyAB commented 5 years ago

@ChenCong7375 @beHappy666

Do you have scale_channels layer implement?3q

Yes.

Will Swish be implemented in darknet soon? which is based on RELU/RELU6?

There are already implemented in the last commits:

Swish is based on sigmoid, swish = x * sigmoid(x) later I will add h-swish = x * ReLU6(x+3) / 6 from MobileNet v3: https://github.com/AlexeyAB/darknet/issues/3494
Squeeze-n-excitation blocks that is based on [scale_channels]-layer

AlexeyAB commented 5 years ago

@dexception I will add b0, b4 and may be other models in 1-2 days. I just should test it. It would be nice if you can train them about 1-1.5 million iterations (at least 100 epochs with batch=120).

dexception commented 5 years ago

@dexception I will try for sure.

Just want to mention this ....so that we are on track:

EfficientNet B0 Stats: Difference of 8.26% Top 1 Accuracy with the actual. Difference of 4.61% Top 5 Accuracy with the actual. Flops: 0.915 vs 0.39 with the actual. (2.34 Times)

https://github.com/AlexeyAB/darknet/files/3307881/efficientnet_b0.cfg.txt

AlexeyAB commented 5 years ago

@dexception

EfficientNet B0 Stats: Difference of 8.26% Top 1 Accuracy with the actual. Difference of 4.61% Top 5 Accuracy with the actual.

It is just because there wasn't used Siwsh-activation - I will add. And because it was trained 360 000 iterations instead of 1 600 000 iterations with another learning rate policy - I will change.

Flops: 0.915 vs 0.39 with the actual. (2.34 Times)

This is strange, since I used absolutely the same model. Also you can compare their Flops for ResNet50 or 101 with

ResNet50: 4.1 BFlops shown in their paper Table 2: https://arxiv.org/pdf/1905.11946v2.pdf
ResNet50: 9.74 BFlops shown in Joseph's site: https://pjreddie.com/darknet/imagenet/
ResNet50: 10 BFlops shown in Darknet

So it seems they calculate two operations (ADD+MUL) as one FMA-operation (which is used in CPUs, GPUs and probably in their TPUs): https://en.wikipedia.org/wiki/FMA_instruction_set

So we use correct model, just we calculate Flops in different ways, our approach is correct: https://en.wikipedia.org/wiki/FLOPS

https://github.com/AlexeyAB/darknet/blob/88cccfcad4f9591a429c1e71c88a42e0e81a5e80/src/convolutional_layer.c#L363 https://github.com/AlexeyAB/darknet/blob/88cccfcad4f9591a429c1e71c88a42e0e81a5e80/src/convolutional_layer.c#L550

Output of ResNet50:

dexception commented 5 years ago

@AlexeyAB Thanks for the explanation. Learned alot from you. My main objective is to use EfficientNet for Object Detection. Can't wait to try it.

beHappy666 commented 5 years ago

@AlexeyAB 3q

AlexeyAB commented 5 years ago

@dexception @beHappy666 @nseidl @WongKinYiu @LukeAI @mdv3101 @ChenCong7375

I added 4 cfg-files Classifier EfficientNets: B0, B3, B3_320, B4: https://github.com/AlexeyAB/darknet/issues/3380#issuecomment-501263052 (there are used: squeeze-n-excitation, swish-sigmoid, dropout, residual-connections, grouped-convolutionals) To get the highest Top1/Top5 results, you should train it at least 1 600 000 iterations with batch=128.

Also I added EfficientNet B0 XNOR where are replaced Depth-wise-conv-layers by XNOR-conv-layers.

You can try to train it on ImageNet (ILSVRC2012), I wrote there how to do it: https://github.com/AlexeyAB/darknet/issues/3380#issuecomment-501263052

After you train one of them on ImageNet, it can be used as pre-trained weights-files for detection networks. Then I will create a Detection network: EfficientNet-backend + TridentNet (or FPN as in Yolov3) + Yolo_Head

I will add GIoU, Mixup, Scale_xy, and may be new_PAN and Assisted Excitation of Activations, If I have time to make them: https://github.com/AlexeyAB/darknet/projects/1#card-22787888 Then you can train it on MS COCO and get state-of-art results.

Also you can try to train EfficientNet on Stylized-ImageNet + ImageNet and get state-of-art results:

dexception commented 5 years ago

@AlexeyAB I have always hated the idea of putting names of the categories within the name of images. For now i have no choice but to follow it. Eventually it would better to have a single csv file for classification rather than this.

AlexeyAB commented 5 years ago

@dexception This is not Darknet's idea. This is done in default ILSVRC2012_img_train.tar (ImageNet). Maybe in the future I will make an alternative with txt or csv file/files, but this is not a priority.

dexception commented 5 years ago

@AlexeyAB Just started training for EfficientNet b0 model. I have a 2080 TI(only one) machine. batch=128 subdivision=4

I guess it is going to take a week.

ChenCong7375 commented 5 years ago

@AlexeyAB I can train cifar with EfficientNet_b0 on my titanXp, I think there must be some error in my detection cfg. #3500 I am looking forward to your object-detection work on EfficientNet. Thank you very much.

ChenCong7375 commented 5 years ago

@dexception here. efficientnet_b0_cg.cfg.txt

WongKinYiu commented 5 years ago

I think maybe scale_channels_layer has cuda init problem.

When random parameter of yolo layer is set to 1, it will get CUDNN_STATUS_EXECUTION_FAILED error. If disable cudnn, it will get illegal memory access error.

When random parameter of yolo layer is set to 0, anything is fine.

dexception commented 5 years ago

@AlexeyAB I can train cifar with EfficientNet_b0 on my titanXp, I think there must be some error in my detection cfg. #3500 I am looking forward to your object-detection work on EfficientNet. Thank you very much.

@ChenCong7375 Can you try and train on efficientNet-b3 model ? https://github.com/AlexeyAB/darknet/files/3340717/efficientnet_b3.cfg.txt

AlexeyAB commented 5 years ago

@WongKinYiu

What init problem do you mean? https://github.com/AlexeyAB/darknet/blob/54e2d0b0e8909bc1da8a2d15113b4f2669ce2f4e/src/scale_channels_layer.c#L7-L40

When random parameter of yolo layer is set to 1, it will get CUDNN_STATUS_EXECUTION_FAILED error. If disable cudnn, it will get illegal memory access error.

When random parameter of yolo layer is set to 0, anything is fine.

Do you mean an error occurs during backpropagation from yolo-layer (training)?

As you can see it doesn't use cuDNN: https://github.com/AlexeyAB/darknet/blob/54e2d0b0e8909bc1da8a2d15113b4f2669ce2f4e/src/scale_channels_layer.c#L96-L116

To get correct error place, you should build Darknet with DEBUG=1

WongKinYiu commented 5 years ago

@WongKinYiu

What init problem do you mean?

https://github.com/AlexeyAB/darknet/blob/54e2d0b0e8909bc1da8a2d15113b4f2669ce2f4e/src/scale_channels_layer.c#L7-L40

When random parameter of yolo layer is set to 1, it will get CUDNN_STATUS_EXECUTION_FAILED error. If disable cudnn, it will get illegal memory access error. When random parameter of yolo layer is set to 0, anything is fine.

Do you mean an error occurs during backpropagation from yolo-layer (training)?

As you can see it doesn't use cuDNN:

https://github.com/AlexeyAB/darknet/blob/54e2d0b0e8909bc1da8a2d15113b4f2669ce2f4e/src/scale_channels_layer.c#L96-L116

To get correct error place, you should build Darknet with DEBUG=1

I am not sure add resize_scale_channels_layer to network.c is necessary or not. Or maybe error occurs in other layer. I get a fever now, I will check it using 'DEBUG=1' after I feel better.

AlexeyAB commented 5 years ago

@WongKinYiu

I am not sure add resize_scale_channels_layer to network.c is necessary or not.

Yes, it is there: https://github.com/AlexeyAB/darknet/blob/54e2d0b0e8909bc1da8a2d15113b4f2669ce2f4e/src/scale_channels_layer.c#L42-L58

WongKinYiu commented 5 years ago

@WongKinYiu

I am not sure add resize_scale_channels_layer to network.c is necessary or not.

Yes, it is there:

https://github.com/AlexeyAB/darknet/blob/54e2d0b0e8909bc1da8a2d15113b4f2669ce2f4e/src/scale_channels_layer.c#L42-L58

I mean here. But even though I add resize_scale_channels_layer to network.c, the error still occurs. I have no other idea why it will get error when set random=1 now.

AlexeyAB commented 5 years ago

@WongKinYiu I fixed it: https://github.com/AlexeyAB/darknet/commit/5a6afe96d3aa8aed19405577db7dba0ff173c848

I don't get an error if I set width=320 height=320 random=1

WongKinYiu commented 5 years ago

@WongKinYiu I fixed it: 5a6afe9

I don't get an error if I set width=320 height=320 random=1

Hello, previous I got error at w=h=416, after training 50~80 epochs. I will try new repo after tomorrow, thank you.

dexception commented 5 years ago

@AlexeyAB The iterations are moving too slowly. They are stopping after every 20-30 iterations. Is this normal ?

AlexeyAB commented 5 years ago

@dexception

The iterations are moving too slowly. They are stopping after every 20-30 iterations. Is this normal ?

With what message?

dexception commented 5 years ago

(next TOP5 calculation at 20011 iterations) Tensor Cores are disabled until the first 3000 iterations are reached.

AlexeyAB commented 5 years ago

The iterations are moving too slowly. They are stopping after every 20-30 iterations. Is this normal ?

Does the program crash completely after each 30 iterations or just pause for a while?

AlexeyAB / darknet

EfficientNet | Implementation ? #3380