Closed dexception closed 3 years ago
+1
Paper: https://arxiv.org/abs/1905.11946v2
Classifier
EfficientNet B0 (224x224) 0.9 BFLOPS - 0.45 B_FMA (16ms / RTX 2070), 4.9M params: efficientnet_b0.cfg.txt - Training 2.5 days
71.3% Top1 - 90.4% Top5 - accuracy weights file: https://drive.google.com/open?id=1nGdWz76A2EpNhWIfDeAI3hribboilux-
While (Official) EfficientNetB0 (224x224) 0.78 BFLOPS - 0.39 FMA, 5.3M params - that is trained by official code https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet with batch size equals to 256 has lower accuracy: 70.0% Top1 and 88.9% Top5
Detector - 3.7 BFLOPs, 45.0 mAP@0.5 on COCO test-dev.
cfg-file: enet-coco.cfg.txt
weights file: https://drive.google.com/open?id=1FlHeQjWEQVJt0ay1PVsiuuMzmtNyv36m
efficientnet-lite3-leaky.cfg: top-1 73.0%, top-5 92.4%. - change relu6 to leaky: activation=leaky
https://github.com/AlexeyAB/darknet/blob/master/cfg/efficientnet-lite3.cfg
Classifiers: - Can be trained on ImageNet(ILSVRC2012) by using 4 x GPU 2080 TI:
EfficientNet B0 XNOR (224x224) 0.8 BFLOPS + 25 BOPS (18ms / RTX 2070): efficientnet_b0_xnor.cfg.txt - 5 days
EfficientNet B3 (288x288) 3.5 BFLOPS - 1.8 B_FMA (28ms/RTX 2070): efficientnet_b3.cfg.txt - 11 days
EfficientNet B3 (320x320) 4.3 BFLOPS - 2.2 B_FMA (30ms/RTX 2070): efficientnet_b3_320.cfg.txt - 14 days
EfficientNet B4 (384x384) 10.2 BFLOPS - 5.1 B_FMA (46ms/RTX 2070): efficientnet_b4.cfg.txt - 26 days
Training command:
./darknet classifier train cfg/imagenet1k_c.data cfg/efficientnet_b0.cfg -topk
Continue training:
./darknet classifier train cfg/imagenet1k_c.data cfg/efficientnet_b0.cfg backup/efficientnet_b0_last.weights -topk
Content of imagenet1k_c.data
:
classes=1000
train = data/imagenet1k.train_c.list
valid = data/inet.val_c.list
backup = backup
labels = data/imagenet.labels.list
names = data/imagenet.shortnames.list
top=5
Dataset - each image in imagenet1k.train_c.list
and inet.val_c.list
has one of 1000 labels from imagenet.labels.list
, for example n01440764
imagenet.labels.list: https://github.com/AlexeyAB/darknet/blob/master/data/imagenet.labels.list
imagenet.shortnames.list: https://github.com/AlexeyAB/darknet/blob/master/data/imagenet.shortnames.list
ILSVRC2012 training dataset - annotated images - 138 GB: https://github.com/AlexeyAB/darknet/blob/master/scripts/get_imagenet_train.sh
ILSVRC2012 validation dataset:
More: http://www.image-net.org/challenges/LSVRC/2012/nonpub-downloads
# (width_coefficient, depth_coefficient, resolution, dropout_rate)
'efficientnet-b0': (1.0, 1.0, 224, 0.2),
'efficientnet-b1': (1.0, 1.1, 240, 0.2),
'efficientnet-b2': (1.1, 1.2, 260, 0.3),
'efficientnet-b3': (1.2, 1.4, 300, 0.3),
'efficientnet-b4': (1.4, 1.8, 380, 0.4),
'efficientnet-b5': (1.6, 2.2, 456, 0.4),
'efficientnet-b6': (1.8, 2.6, 528, 0.5),
'efficientnet-b7': (2.0, 3.1, 600, 0.5),
https://www.dlology.com/blog/transfer-learning-with-efficientnet/
https://github.com/zsef123/EfficientNets-PyTorch/tree/master/models
In other words, to scale up the CNN, the depth of layers should increase 20%, the width 10% and the image resolution 15% to keep things as efficient as possible while expanding the implementation and improving the CNN accuracy.
The MBConv block is nothing fancy but an Inverted Residual Block (used in MobileNetV2) with a Squeeze and Excite block injected sometimes.
MobileNetV2: Inverted Residuals and Linear Bottlenecks: https://arxiv.org/pdf/1801.04381v4.pdf
MobileNetV2 graph: http://ethereon.github.io/netscope/#/gist/d01b5b8783b4582a42fe07bd46243986
MobileNetV2 proto: https://github.com/shicai/MobileNet-Caffe/blob/master/mobilenet_v2_deploy.prototxt
MobileNetv2 Darknet-cfg: https://github.com/WePCf/darknet-mobilenet-v2/blob/master/mobilenet/test.cfg (should be trained from the begining, since the src/image.c and examples/classifier.c are modified in WePCf-repo, search for "mobilenet" to see what are changed)
MobileNet_v2:
EfficientNet_b0:
EfficientNet_b0: efficientnet_b0.cfg.txt - Accuracy: Top1 = 57.6%, Top5 = 81.2% - 150 000 iterations (something goes wrong)
Would like to share this link.
https://pypi.org/project/gluoncv2/
Interesting to see the imagenet-1k comparison chart.
Model | Top 1 Error | Top 5 Error | Params | Flops DarkNet-53 | 21.41 | 5.56 | 41,609,928 | 7,133.86M EfficientNet-B0b | 23.41 | 6.95 | 5,288,548 | 414.31M
With the difference of 2% in top 1 error with number of parameters are 1/8 and 1/17 less flops. Would love to see the inference time and accuracy as object detection.
Also a tiny version wouldn't be bad after all. This is like running yolov3-tiny with yolov3 accuracy.
@dexception Have you ever seen a graphic representation of EfficientNet b1 - b7 models (other than b0), or their exact text description, like Caffe proto-files?
EfficientNet_b4: efficientnet_b4.cfg.txt
@AlexeyAB
Keras, Pytorch and Mxnet implementation is definitely there: https://github.com/qubvel/efficientnet https://github.com/lukemelas/EfficientNet-PyTorch https://github.com/titu1994/keras-efficientnets https://github.com/zsef123/EfficientNets-PyTorch https://github.com/DableUTeeF/keras-efficientnet https://github.com/qubvel/efficientnet https://github.com/mnikitin/EfficientNet/blob/master/efficientnet_model.py
The code and research paper is different. But the code is correct. https://github.com/tensorflow/tpu/issues/383
I don't think there is any caffe implementation as of yet.
Hello, I draw the model from Keras implementation: https://github.com/qubvel/efficientnet . Here are b0 and b1.
I use the code: `from efficientnet import EfficientNetB1 from keras.utils import plot_model
model = EfficientNetB1() plot_model(model, to_file='EfficientNetB1.png')`
EfficientNet_b0: efficientnet_b0.cfg.txt - Accuracy: Top1 = 19.3%, Top5 = 40.6% (something goes wrong)
Maybe squeeze and excitation blocks are missing?
@WongKinYiu Thanks!
Can you also add model diagram for B4?
Maybe squeeze and excitation blocks are missing?
I think yes, there should be:
@dexception Thanks!
Model diagram for EfficientNets.
@WongKinYiu Thanks!
It seems now it looks like your diagram: efficientnet_b0.cfg.txt
Should be used: should be trained at least 1.6 M iterations with learning_rate=0.256 policy=step scale=0.97 step=10000
(initial learning rate 0.256 that decays by 0.97 every 2.4 epochs) to achieve Top1 = 76.3%, Top5 = 93.2%
Trained weights-file, 500 000 iterations with batch=120: https://drive.google.com/open?id=1MvX0skcmg87T_jn8kDf2Oc6raIb56xq9
Just
[dropout]
instead of DropConnectOn your diagrams Lambda is a [avgpool].
MBConv blocks includes:
[avgpool]->[conv]->[conv]->[scale_channels]
)[dropout]
-layer before each [shortcut]
-residual layer
as it is done here: https://github.com/qubvel/efficientnet/blob/master/efficientnet/model.py@AlexeyAB Good job! And thank you for sharing the cfg file.
I will also implement SNet of ThunderNet as backbone to compare with EfficientNet.
@WongKinYiu Yes, this is interesting that SNet+ThunderNet achieved the same accuracy 78.6% mAP@0.5 as Yolo v2, but by using 2-stage-detector with 24 FPS on ARM CPU: https://paperswithcode.com/sota/object-detection-on-pascal-voc-2007
@AlexeyAB I also want to implement CEM (Context Enhancement Module) and SAM (Spatial Attention Module) of ThunderNet.
CEM + YOLOv3 got 41.2% mAP@0.5 with 2.85 BFLOPs. CEM + SAM + YOLOv3 got 42.0% mAP@0.5 with 2.90 BFLOPs.
CEM:
SAM:
Results:
I'd be interested in running a trial with efficientnet and sharing the results - do you have a B6 or B7 version of the model? Do I use it in the same way as I would with any of the other cfg files? No need to manually calculate anchors and enter classes in the cfg?
Oh I see - efficientnet is a full Object Detector? But maybe the B7 model with a Yolo head... ?
@LukeAI This is imagenet classification.
Ok so I realise that this is image classification - I have an image classification problem with 7 classes - if necessary I could resize all my images to 32x32 - how could I train/test on my dataset with the .cfg ?
@AlexeyAB Nice work on EfficientNet. If implemented successfully this would give the fastest training and inference time among all implementations.
@AlexeyAB Since we are already discussing the newer models here https://github.com/AlexeyAB/darknet/issues/3114
This issue should be merged with this. Because eventually we will have yolo-head with EfficientNet once the niggles are sorted out.
Will Swish be implemented in darknet soon? which is based on RELU/RELU6?
Do you have scale_channels layer implement?3q
@WongKinYiu Thanks!
It seems now it looks like your diagram: efficientnet_b0.cfg.txt
- top1 = 68.04%
- top5 = 88.59%
Should be used: should be used Swish instead of leaky-ReLU, should be trained at least 1M iterations with
learning_rate=0.256 policy=step scale=0.97 step=10000
(initial learning rate 0.256 that decays by 0.97 every 2.4 epochs)Trained weights-file, 378 000 iterations with batch=120: https://drive.google.com/open?id=1PWbM3en8mOqIbe9kIrEY-ljvvcmTR5AK
Just
- I use
[dropout]
instead of DropConnect- I use
activation=leaky
-relu (slope=0.1) instead of SwishOn your diagrams Lambda is a [avgpool].
MBConv blocks includes:
- Squeeze-and-Excitation blocks (layers:
[avgpool]->[conv]->[conv]->[scale_channels]
)- and
[dropout]
-layer before each[shortcut]
-residual layer as it is done here: https://github.com/qubvel/efficientnet/blob/master/efficientnet/model.py
@AlexeyAB
Can you share other cfg files for EfficientNet ? I would like to give it a try.
@ChenCong7375 @beHappy666
Do you have scale_channels layer implement?3q
Yes.
Will Swish be implemented in darknet soon? which is based on RELU/RELU6?
There are already implemented in the last commits:
Swish is based on sigmoid, swish = x * sigmoid(x)
later I will add h-swish = x * ReLU6(x+3) / 6
from MobileNet v3: https://github.com/AlexeyAB/darknet/issues/3494
Squeeze-n-excitation blocks that is based on [scale_channels]
-layer
@dexception I will add b0, b4 and may be other models in 1-2 days. I just should test it. It would be nice if you can train them about 1-1.5 million iterations (at least 100 epochs with batch=120).
@dexception I will try for sure.
Just want to mention this ....so that we are on track:
EfficientNet B0 Stats: Difference of 8.26% Top 1 Accuracy with the actual. Difference of 4.61% Top 5 Accuracy with the actual. Flops: 0.915 vs 0.39 with the actual. (2.34 Times)
https://github.com/AlexeyAB/darknet/files/3307881/efficientnet_b0.cfg.txt
@dexception
EfficientNet B0 Stats: Difference of 8.26% Top 1 Accuracy with the actual. Difference of 4.61% Top 5 Accuracy with the actual.
It is just because there wasn't used Siwsh-activation - I will add. And because it was trained 360 000 iterations instead of 1 600 000 iterations with another learning rate policy - I will change.
Flops: 0.915 vs 0.39 with the actual. (2.34 Times)
This is strange, since I used absolutely the same model. Also you can compare their Flops for ResNet50 or 101 with
ResNet50: 4.1 BFlops
shown in their paper Table 2: https://arxiv.org/pdf/1905.11946v2.pdf
ResNet50: 9.74 BFlops
shown in Joseph's site: https://pjreddie.com/darknet/imagenet/
ResNet50: 10 BFlops
shown in Darknet
So it seems they calculate two operations (ADD+MUL) as one FMA-operation (which is used in CPUs, GPUs and probably in their TPUs): https://en.wikipedia.org/wiki/FMA_instruction_set
So we use correct model, just we calculate Flops in different ways, our approach is correct: https://en.wikipedia.org/wiki/FLOPS
https://github.com/AlexeyAB/darknet/blob/88cccfcad4f9591a429c1e71c88a42e0e81a5e80/src/convolutional_layer.c#L363 https://github.com/AlexeyAB/darknet/blob/88cccfcad4f9591a429c1e71c88a42e0e81a5e80/src/convolutional_layer.c#L550
Output of ResNet50:
@AlexeyAB Thanks for the explanation. Learned alot from you. My main objective is to use EfficientNet for Object Detection. Can't wait to try it.
@AlexeyAB 3q
@dexception @beHappy666 @nseidl @WongKinYiu @LukeAI @mdv3101 @ChenCong7375
I added 4 cfg-files Classifier EfficientNets: B0
, B3
, B3_320
, B4
: https://github.com/AlexeyAB/darknet/issues/3380#issuecomment-501263052
(there are used: squeeze-n-excitation, swish-sigmoid, dropout, residual-connections, grouped-convolutionals)
To get the highest Top1/Top5 results, you should train it at least 1 600 000 iterations with batch=128.
Also I added EfficientNet B0 XNOR
where are replaced Depth-wise-conv-layers by XNOR-conv-layers.
You can try to train it on ImageNet (ILSVRC2012), I wrote there how to do it: https://github.com/AlexeyAB/darknet/issues/3380#issuecomment-501263052
After you train one of them on ImageNet, it can be used as pre-trained weights-files for detection networks. Then I will create a Detection network: EfficientNet-backend + TridentNet (or FPN as in Yolov3) + Yolo_Head
I will add GIoU, Mixup, Scale_xy, and may be new_PAN and Assisted Excitation of Activations, If I have time to make them: https://github.com/AlexeyAB/darknet/projects/1#card-22787888 Then you can train it on MS COCO and get state-of-art results.
Also you can try to train EfficientNet on Stylized-ImageNet
+ ImageNet
and get state-of-art results:
@AlexeyAB I have always hated the idea of putting names of the categories within the name of images. For now i have no choice but to follow it. Eventually it would better to have a single csv file for classification rather than this.
@dexception This is not Darknet's idea.
This is done in default ILSVRC2012_img_train.tar
(ImageNet).
Maybe in the future I will make an alternative with txt or csv file/files, but this is not a priority.
@AlexeyAB Just started training for EfficientNet b0 model. I have a 2080 TI(only one) machine. batch=128 subdivision=4
I guess it is going to take a week.
@AlexeyAB I can train cifar with EfficientNet_b0 on my titanXp, I think there must be some error in my detection cfg. #3500 I am looking forward to your object-detection work on EfficientNet. Thank you very much.
@dexception here. efficientnet_b0_cg.cfg.txt
I think maybe scale_channels_layer has cuda init problem.
When random parameter of yolo layer is set to 1, it will get CUDNN_STATUS_EXECUTION_FAILED error. If disable cudnn, it will get illegal memory access error.
When random parameter of yolo layer is set to 0, anything is fine.
@AlexeyAB I can train cifar with EfficientNet_b0 on my titanXp, I think there must be some error in my detection cfg. #3500 I am looking forward to your object-detection work on EfficientNet. Thank you very much.
@ChenCong7375 Can you try and train on efficientNet-b3 model ? https://github.com/AlexeyAB/darknet/files/3340717/efficientnet_b3.cfg.txt
@WongKinYiu
What init problem do you mean? https://github.com/AlexeyAB/darknet/blob/54e2d0b0e8909bc1da8a2d15113b4f2669ce2f4e/src/scale_channels_layer.c#L7-L40
When random parameter of yolo layer is set to 1, it will get CUDNN_STATUS_EXECUTION_FAILED error. If disable cudnn, it will get illegal memory access error.
When random parameter of yolo layer is set to 0, anything is fine.
Do you mean an error occurs during backpropagation from yolo-layer (training)?
As you can see it doesn't use cuDNN: https://github.com/AlexeyAB/darknet/blob/54e2d0b0e8909bc1da8a2d15113b4f2669ce2f4e/src/scale_channels_layer.c#L96-L116
To get correct error place, you should build Darknet with DEBUG=1
@WongKinYiu
What init problem do you mean?
When random parameter of yolo layer is set to 1, it will get CUDNN_STATUS_EXECUTION_FAILED error. If disable cudnn, it will get illegal memory access error. When random parameter of yolo layer is set to 0, anything is fine.
Do you mean an error occurs during backpropagation from yolo-layer (training)?
As you can see it doesn't use cuDNN:
To get correct error place, you should build Darknet with
DEBUG=1
I am not sure add resize_scale_channels_layer to network.c is necessary or not. Or maybe error occurs in other layer. I get a fever now, I will check it using 'DEBUG=1' after I feel better.
@WongKinYiu
I am not sure add resize_scale_channels_layer to network.c is necessary or not.
Yes, it is there: https://github.com/AlexeyAB/darknet/blob/54e2d0b0e8909bc1da8a2d15113b4f2669ce2f4e/src/scale_channels_layer.c#L42-L58
@WongKinYiu
I am not sure add resize_scale_channels_layer to network.c is necessary or not.
Yes, it is there:
I mean here. But even though I add resize_scale_channels_layer to network.c, the error still occurs. I have no other idea why it will get error when set random=1 now.
@WongKinYiu I fixed it: https://github.com/AlexeyAB/darknet/commit/5a6afe96d3aa8aed19405577db7dba0ff173c848
I don't get an error if I set width=320 height=320 random=1
@WongKinYiu I fixed it: 5a6afe9
I don't get an error if I set width=320 height=320 random=1
Hello, previous I got error at w=h=416, after training 50~80 epochs. I will try new repo after tomorrow, thank you.
@AlexeyAB The iterations are moving too slowly. They are stopping after every 20-30 iterations. Is this normal ?
@dexception
The iterations are moving too slowly. They are stopping after every 20-30 iterations. Is this normal ?
With what message?
(next TOP5 calculation at 20011 iterations) Tensor Cores are disabled until the first 3000 iterations are reached.
(next TOP5 calculation at 20011 iterations) Tensor Cores are disabled until the first 3000 iterations are reached.
(next TOP5 calculation at 20011 iterations) Tensor Cores are disabled until the first 3000 iterations are reached.
(next TOP5 calculation at 20011 iterations) Tensor Cores are disabled until the first 3000 iterations are reached.
The iterations are moving too slowly. They are stopping after every 20-30 iterations. Is this normal ?
Does the program crash completely after each 30 iterations or just pause for a while?
https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet https://ai.googleblog.com/2019/05/efficientnet-improving-accuracy-and.html https://www.youtube.com/watch?v=3svIm5UC94I
This is good.