Open 121786404 opened 4 years ago
@biubug6 Isn't the entire idea of v2 and v3 that it increases performance (speed and accuracy)? https://ai.googleblog.com/2018/04/mobilenetv2-next-generation-of-on.html
The one we are using is the second green dot from the left, which gets 46% and 6ms latency. The v2 has the same latency and gets over 50% accuracy. Or, there is one that gets the same accuracy but at 4ms latency. So faster or more accurate.
The only issue is that MobileNet is not designed for GPUs at all.
However, MobileNet V2 uses depthwise separable convolutions which are not directly supported in GPU firmware (the cuDNN library). Therefore, MobileNet V2 tends to be slower than ResNet18 in most experimental setups.
What about Shufflenetv2?
Edit: Just tried ShuffleNet v2 (0.5 with ImageNet 60%). RetinaFace inference is now faster. So faster and more accurate!
Only downside is model is 24MB instead of 1.5MB :) I'll train it out and tell you how it goes.
If you are interested in this project, welcome to join us and pull requests. I also will do similar experiments and compare with you within the next two days.
The shufflenet v2 0.5x isn't any faster when jitted. It is more accurate though. Ideally I'd go for a shufflenet v2 0.25x but there isn't any pretrained.
@xsacha shufflenet v2 0.5x is a little slow! The improvement of accuracy is not obvious. The result is as follows(wider face val): original size: easy 90.75 mdium 88.30 hard 73.25 scale size: easy 88.80 mdium 87.11 hard 80.77
How much slower are you seeing? I'm getting same time with shufflenet v2 but better accuracy. Similar results to what you are getting. Looks good to me, almost a full 1% higher across the board.
@xsacha
using test.jpg and executing "detect.py":
net forward time: 0.0099
net forward time: 0.0104
net forward time: 0.0100
net forward time: 0.0103
net forward time: 0.0100
net forward time: 0.0104
net forward time: 0.0100
Mobilenet0.25 only consumes approximately 5~6ms.
Hi guys! I tried other version of mobilenet v1 (the only one I could find pretrained on Imagenet): The pretrained model can be downloaded from here: https://pan.baidu.com/s/1eRCxYKU
` def init(self): super(MobileNetV15, self).init()
self.stage1 = nn.Sequential(
conv_bn(3, 32, 2), # 3
conv_dw(32, 64, 1), # 7
conv_dw(64, 128, 2), # 11
conv_dw(128, 128, 1), # 19
conv_dw(128, 256, 2), # 27
conv_dw(256, 256, 1), # 43
)
self.stage2 = nn.Sequential(
conv_dw(256, 512, 2),
conv_dw(512, 512, 1),
conv_dw(512, 512, 1),
conv_dw(512, 512, 1),
conv_dw(512, 512, 1),
conv_dw(512, 512, 1),
)
self.stage3 = nn.Sequential(
conv_dw(512, 1024, 2),
conv_dw(1024, 1024, 1),
)
self.avg = nn.AvgPool2d(7)
self.fc = nn.Linear(1024, 1000)`
The results are slightly better:
I called it v2, but it is actually v1 but with more units. Still didn't finish comparing them. @xsacha would you mind sharing your shufflenet v2 net please?
Interestingly, even though the model is much more "wide", the inference speed on test.jpg is still impressive:
net forward time: 0.0051 net forward time: 0.0052 net forward time: 0.0052 net forward time: 0.0052 net forward time: 0.0052 net forward time: 0.0052 net forward time: 0.0052 net forward time: 0.0051 net forward time: 0.0051 net forward time: 0.0052 net forward time: 0.0051 net forward time: 0.0051 net forward time: 0.0051
You use mobilenetv1 instead of mobilenetv1X0.25. So it's going to be slow on the same device.
But the net speeds (published above) for v1 are still fast (although model size jumped to 26MB). Net speeds are almost as good as 0.25
If you have a decent GPU, the width will just occupy more of the GPU but will not affect latency.
The problem is when you run it on several at a time. GPU load is also important.
@xsacha - makes sense! I'm curious how it compares to your ShuffleNet, would you mind sparing me the google search for Imagenet pretrained version and sharing it please?
Also are you guys using some predefined evaluation script for widerface that generates these percentages for easy/medium/hard? I cannot see anything like this in this repository..
Executing Evaluation widerface val steps in readme. It can get AP in easy/medium/hard.
@SnowRipple https://drive.google.com/file/d/1rxWtlghq8Slj9IORguIBeyHcILrM4IR8/view?usp=sharing I used torch.hub to get a pretrained shufflenet
elif net == 'shufflenet':
backbone = torch.hub.load('pytorch/vision', 'shufflenet_v2_x0_5', pretrained=True)
return_layers = {'stage2': 1, 'stage3': 2, 'stage4': 3}
in_channels_stage2 = 24
In this version I reduced the input channels to 48. Also the pretrained has a different normalisation (typical ImageNet norm), which I didn't pass through.
It doesn't affect the latency:
net forward time: 0.0057 net forward time: 0.0057 net forward time: 0.0057 net forward time: 0.0057
Load ~33% without JIT and 20% with JIT for single inference of the test image.
Mobilenet was ~37% without JIT and 14% with JIT. So I suppose Mobilenet is better with JIT here. That's why I'd be interested in a 0.25x.
Conclusion: Mobilenet still faster. Alternatives?
@xsacha mnasNet looks interesting and is provided in pytorch.hub
mnasnet has the same issue as efficientnet. All the implementations consume too much memory. I'm trying mnasnet0_5 with a batch size of 32. Return layers are 9, 11 and 12. Seems more accurate but also slower.
Note: because I'm using official version, I don't have latest pretrained. See: https://github.com/pytorch/vision/pull/1224
test test.jpg with detect.py with mobile0.25 in cpu mode, but the forward time is very slow(400ms), far slow than paper said, does anyone know why?
@1093842024 I test the script on CPU(1 core), the inference time is about 70ms.
@1093842024 may be yours cpu have no AVX2 instructions, or yours pytorch was compiled without avx2 support. Did you do warmup before time test?
So, any one uses the MobileNetV3 as backbone?
It is very similar to v2 and has the same drawbacks.
@xsacha you mean the speed is lower than v1 on GPU? And if I use the mobilenet backbone, I would use the retinaface on edge devices.
@hegc I tried mv2 and mv3 but in my case mv2 was better
@rydenisbak Can you share the results of mobilev2 on the widerface?
@1093842024 I test the script on CPU(1 core), the inference time is about 70ms.
did u test the time of prior_box part, it's need more time than net forward.
Does anyone use MobileNetV3 as backbone get a better result?
@rydenisbak Can you share the results and speeds of mobilev2 on the widerface?
mobilenetv3 has layers channel of 16, 24, 40, 48 ... How can we return_layers? Because It requires a double size of layers (16 ->32 -> 64 ,...) any hind would be appreciated.
Here are some improvements:
mobile0.25: change output channel from 64-> 128 increases > 1 AP for each group of data. Easy Val AP: 0.9173621803802692 Medium Val AP: 0.8962414033919657 Hard Val AP: 0.7556337455094042 CPU 0.0959 GPU: 0.0044
mobilenetv3 Small 128 output channel: 6.6M Easy Val AP: 0.9322953274903119 Medium Val AP: 0.9116441066993648 Hard Val AP: 0.7799313057781694 CPU 0.0911 GPU 0.0080
shufflenetv2 2.2M Easy Val AP: 0.9165965195928714 Medium Val AP: 0.8900462910575522 Hard Val AP: 0.7416302903715322 CPU 0.1960 GPU 0.0153
Mobilenetv3 looks good (for CPU)! Thanks @quocnhat
Here are some improvements:
- mobile0.25: change output channel from 64-> 128 increases > 1 AP for each group of data. Easy Val AP: 0.9173621803802692 Medium Val AP: 0.8962414033919657 Hard Val AP: 0.7556337455094042 CPU 0.0959 GPU: 0.0044
- mobilenetv3 Small 128 output channel: 6.6M Easy Val AP: 0.9322953274903119 Medium Val AP: 0.9116441066993648 Hard Val AP: 0.7799313057781694 CPU 0.0911 GPU 0.0080
- shufflenetv2 2.2M Easy Val AP: 0.9165965195928714 Medium Val AP: 0.8900462910575522 Hard Val AP: 0.7416302903715322 CPU 0.1960 GPU 0.0153
Hi, Can you share your mobilenet-v3 small?
https://github.com/rwightman/pytorch-image-models/tree/master/timm all the models are in this Github, Thanks #Rwightman btw, backbone mobilenetv2_100 seems very impresive ( fast and accurate : I.,e: easy|medium|hard|model_size|FPS (GPU)| FPS (CPU) 0.94127 | 0.9248 | 0.8284 | 8.3 | 167.67 | 4.545 GPU: GTX 1080, CPU: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
CPU: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz @quocnhat why is the mobilenetv2 slower than mobilenetv1 in my pc?
@quocnhat, would you mind sharing your mobilenetv2_100 net please? I don‘t find it in https://github.com/rwightman/pytorch-image-models/tree/master/timm.
@quocnhat, would you mind sharing your mobilenetv2_100 net please? I don‘t find it in https://github.com/rwightman/pytorch-image-models/tree/master/timm.
https://rwightman.github.io/pytorch-image-models/
This is what you need, please check
@quocnhat, thanks a lot
mobilenetv3 has layers channel of 16, 24, 40, 48 ... How can we return_layers? Because It requires a double size of layers (16 ->32 -> 64 ,...) any hind would be appreciated.
@quocnhat, can you share how to solve this problem in mobilenetv2_100 net?
In file retinaface.py, replace in_channels_list to the three input FPN size of your model. for ex: in_channels_list = [ 16, 24, 40]
Here are some improvements:
- mobile0.25: change output channel from 64-> 128 increases > 1 AP for each group of data. Easy Val AP: 0.9173621803802692 Medium Val AP: 0.8962414033919657 Hard Val AP: 0.7556337455094042 CPU 0.0959 GPU: 0.0044
- mobilenetv3 Small 128 output channel: 6.6M Easy Val AP: 0.9322953274903119 Medium Val AP: 0.9116441066993648 Hard Val AP: 0.7799313057781694 CPU 0.0911 GPU 0.0080
- shufflenetv2 2.2M Easy Val AP: 0.9165965195928714 Medium Val AP: 0.8900462910575522 Hard Val AP: 0.7416302903715322 CPU 0.1960 GPU 0.0153
The CPU speed of shufflenetv2 in your testing result was slower than expected. Did you test the ShuffleNetV2-0.25x or ShuffleNet-0.5x(g=3) on it?
Here are some improvements:
- mobile0.25: change output channel from 64-> 128 increases > 1 AP for each group of data. Easy Val AP: 0.9173621803802692 Medium Val AP: 0.8962414033919657 Hard Val AP: 0.7556337455094042 CPU 0.0959 GPU: 0.0044
- mobilenetv3 Small 128 output channel: 6.6M Easy Val AP: 0.9322953274903119 Medium Val AP: 0.9116441066993648 Hard Val AP: 0.7799313057781694 CPU 0.0911 GPU 0.0080
- shufflenetv2 2.2M Easy Val AP: 0.9165965195928714 Medium Val AP: 0.8900462910575522 Hard Val AP: 0.7416302903715322 CPU 0.1960 GPU 0.0153
The CPU speed of shufflenetv2 in your testing result was slower than expected. Did you test the ShuffleNetV2-0.25x or ShuffleNet-0.5x(g=3) on it?
If I did not mention anything else, It means that the network is the same as the default. pls check it
@quocnhat, would you mind sharing your mobilenetv2_100 net please? I don‘t find it in https://github.com/rwightman/pytorch-image-models/tree/master/timm.
https://rwightman.github.io/pytorch-image-models/
This is what you need, please check
Thank you for the reference, but I'm still having trouble to use timm models, like setting cfg parameters (specialy return_layers) and such. Could you please give more info on that? Thank you
Edit: I managed to use the parameter features_only to obtain the FPN, Thank you for the tips!
You're right. The more expressive the backbone network is, the better it performs, but the slower the speed will be.