biubug6 / Pytorch_Retinaface

Retinaface get 80.99% in widerface hard val using mobilenet0.25.
MIT License
2.59k stars 760 forks source link

use mobilenetv3 may Get better results? #3

Open 121786404 opened 4 years ago

biubug6 commented 4 years ago

You're right. The more expressive the backbone network is, the better it performs, but the slower the speed will be.

xsacha commented 4 years ago

@biubug6 Isn't the entire idea of v2 and v3 that it increases performance (speed and accuracy)? https://ai.googleblog.com/2018/04/mobilenetv2-next-generation-of-on.html image

The one we are using is the second green dot from the left, which gets 46% and 6ms latency. The v2 has the same latency and gets over 50% accuracy. Or, there is one that gets the same accuracy but at 4ms latency. So faster or more accurate.

The only issue is that MobileNet is not designed for GPUs at all.

However, MobileNet V2 uses depthwise separable convolutions which are not directly supported in GPU firmware (the cuDNN library). Therefore, MobileNet V2 tends to be slower than ResNet18 in most experimental setups.

What about Shufflenetv2?

Edit: Just tried ShuffleNet v2 (0.5 with ImageNet 60%). RetinaFace inference is now faster. So faster and more accurate!

xsacha commented 4 years ago

Only downside is model is 24MB instead of 1.5MB :) I'll train it out and tell you how it goes.

biubug6 commented 4 years ago

If you are interested in this project, welcome to join us and pull requests. I also will do similar experiments and compare with you within the next two days.

xsacha commented 4 years ago

The shufflenet v2 0.5x isn't any faster when jitted. It is more accurate though. Ideally I'd go for a shufflenet v2 0.25x but there isn't any pretrained.

biubug6 commented 4 years ago

@xsacha shufflenet v2 0.5x is a little slow! The improvement of accuracy is not obvious. The result is as follows(wider face val): original size: easy 90.75 mdium 88.30 hard 73.25 scale size: easy 88.80 mdium 87.11 hard 80.77

xsacha commented 4 years ago

How much slower are you seeing? I'm getting same time with shufflenet v2 but better accuracy. Similar results to what you are getting. Looks good to me, almost a full 1% higher across the board.

biubug6 commented 4 years ago

@xsacha
using test.jpg and executing "detect.py": net forward time: 0.0099 net forward time: 0.0104 net forward time: 0.0100 net forward time: 0.0103 net forward time: 0.0100 net forward time: 0.0104 net forward time: 0.0100 Mobilenet0.25 only consumes approximately 5~6ms.

SnowRipple commented 4 years ago

Hi guys! I tried other version of mobilenet v1 (the only one I could find pretrained on Imagenet): The pretrained model can be downloaded from here: https://pan.baidu.com/s/1eRCxYKU

` def init(self): super(MobileNetV15, self).init()

    self.stage1 = nn.Sequential(
            conv_bn(3, 32, 2),    # 3
            conv_dw(32, 64, 1),   # 7
            conv_dw(64, 128, 2),  # 11
            conv_dw(128, 128, 1),  # 19
            conv_dw(128, 256, 2),  # 27
            conv_dw(256, 256, 1),  # 43
        )
    self.stage2 = nn.Sequential(
        conv_dw(256, 512, 2),
        conv_dw(512, 512, 1),
        conv_dw(512, 512, 1),
        conv_dw(512, 512, 1),
        conv_dw(512, 512, 1),
        conv_dw(512, 512, 1),
    )
    self.stage3 = nn.Sequential(
        conv_dw(512, 1024, 2),
        conv_dw(1024, 1024, 1),
    )
    self.avg =  nn.AvgPool2d(7)
    self.fc = nn.Linear(1024, 1000)`

The results are slightly better:

mobilenets_comparison I called it v2, but it is actually v1 but with more units. Still didn't finish comparing them. @xsacha would you mind sharing your shufflenet v2 net please?

SnowRipple commented 4 years ago

Interestingly, even though the model is much more "wide", the inference speed on test.jpg is still impressive:

net forward time: 0.0051 net forward time: 0.0052 net forward time: 0.0052 net forward time: 0.0052 net forward time: 0.0052 net forward time: 0.0052 net forward time: 0.0052 net forward time: 0.0051 net forward time: 0.0051 net forward time: 0.0052 net forward time: 0.0051 net forward time: 0.0051 net forward time: 0.0051

biubug6 commented 4 years ago

You use mobilenetv1 instead of mobilenetv1X0.25. So it's going to be slow on the same device.

SnowRipple commented 4 years ago

But the net speeds (published above) for v1 are still fast (although model size jumped to 26MB). Net speeds are almost as good as 0.25

xsacha commented 4 years ago

If you have a decent GPU, the width will just occupy more of the GPU but will not affect latency.

The problem is when you run it on several at a time. GPU load is also important.

SnowRipple commented 4 years ago

@xsacha - makes sense! I'm curious how it compares to your ShuffleNet, would you mind sparing me the google search for Imagenet pretrained version and sharing it please?

Also are you guys using some predefined evaluation script for widerface that generates these percentages for easy/medium/hard? I cannot see anything like this in this repository..

biubug6 commented 4 years ago

Executing Evaluation widerface val steps in readme. It can get AP in easy/medium/hard.

xsacha commented 4 years ago

@SnowRipple https://drive.google.com/file/d/1rxWtlghq8Slj9IORguIBeyHcILrM4IR8/view?usp=sharing I used torch.hub to get a pretrained shufflenet

    elif net == 'shufflenet':
            backbone = torch.hub.load('pytorch/vision', 'shufflenet_v2_x0_5', pretrained=True)
            return_layers = {'stage2': 1, 'stage3': 2, 'stage4': 3}
            in_channels_stage2 = 24

In this version I reduced the input channels to 48. Also the pretrained has a different normalisation (typical ImageNet norm), which I didn't pass through.

It doesn't affect the latency:

net forward time: 0.0057 net forward time: 0.0057 net forward time: 0.0057 net forward time: 0.0057

Load ~33% without JIT and 20% with JIT for single inference of the test image.

Mobilenet was ~37% without JIT and 14% with JIT. So I suppose Mobilenet is better with JIT here. That's why I'd be interested in a 0.25x.

Conclusion: Mobilenet still faster. Alternatives?

SnowRipple commented 4 years ago

@xsacha mnasNet looks interesting and is provided in pytorch.hub

xsacha commented 4 years ago

mnasnet has the same issue as efficientnet. All the implementations consume too much memory. I'm trying mnasnet0_5 with a batch size of 32. Return layers are 9, 11 and 12. Seems more accurate but also slower.

Note: because I'm using official version, I don't have latest pretrained. See: https://github.com/pytorch/vision/pull/1224

1093842024 commented 4 years ago

test test.jpg with detect.py with mobile0.25 in cpu mode, but the forward time is very slow(400ms), far slow than paper said, does anyone know why?

hegc commented 4 years ago

@1093842024 I test the script on CPU(1 core), the inference time is about 70ms.

rydenisbak commented 4 years ago

@1093842024 may be yours cpu have no AVX2 instructions, or yours pytorch was compiled without avx2 support. Did you do warmup before time test?

hegc commented 4 years ago

So, any one uses the MobileNetV3 as backbone?

xsacha commented 4 years ago

It is very similar to v2 and has the same drawbacks.

hegc commented 4 years ago

@xsacha you mean the speed is lower than v1 on GPU? And if I use the mobilenet backbone, I would use the retinaface on edge devices.

rydenisbak commented 4 years ago

@hegc I tried mv2 and mv3 but in my case mv2 was better

hegc commented 4 years ago

@rydenisbak Can you share the results of mobilev2 on the widerface?

brealisty commented 4 years ago

@1093842024 I test the script on CPU(1 core), the inference time is about 70ms.

did u test the time of prior_box part, it's need more time than net forward.

yy2yy commented 4 years ago

Does anyone use MobileNetV3 as backbone get a better result?

zdaiot commented 4 years ago

@rydenisbak Can you share the results and speeds of mobilev2 on the widerface?

quocnhat commented 4 years ago

mobilenetv3 has layers channel of 16, 24, 40, 48 ... How can we return_layers? Because It requires a double size of layers (16 ->32 -> 64 ,...) any hind would be appreciated.

quocnhat commented 4 years ago

Here are some improvements:

xsacha commented 4 years ago

Mobilenetv3 looks good (for CPU)! Thanks @quocnhat

geoffzhang commented 3 years ago

Here are some improvements:

  • mobile0.25: change output channel from 64-> 128 increases > 1 AP for each group of data. Easy Val AP: 0.9173621803802692 Medium Val AP: 0.8962414033919657 Hard Val AP: 0.7556337455094042 CPU 0.0959 GPU: 0.0044
  • mobilenetv3 Small 128 output channel: 6.6M Easy Val AP: 0.9322953274903119 Medium Val AP: 0.9116441066993648 Hard Val AP: 0.7799313057781694 CPU 0.0911 GPU 0.0080
  • shufflenetv2 2.2M Easy Val AP: 0.9165965195928714 Medium Val AP: 0.8900462910575522 Hard Val AP: 0.7416302903715322 CPU 0.1960 GPU 0.0153

Hi, Can you share your mobilenet-v3 small?

quocnhat commented 3 years ago

https://github.com/rwightman/pytorch-image-models/tree/master/timm all the models are in this Github, Thanks #Rwightman btw, backbone mobilenetv2_100 seems very impresive ( fast and accurate : I.,e: easy|medium|hard|model_size|FPS (GPU)| FPS (CPU) 0.94127 | 0.9248 | 0.8284 | 8.3 | 167.67 | 4.545 GPU: GTX 1080, CPU: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz

geoffzhang commented 3 years ago

CPU: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz @quocnhat why is the mobilenetv2 slower than mobilenetv1 in my pc?

bixiwen commented 3 years ago

@quocnhat, would you mind sharing your mobilenetv2_100 net please? I don‘t find it in https://github.com/rwightman/pytorch-image-models/tree/master/timm.

quocnhat commented 3 years ago

@quocnhat, would you mind sharing your mobilenetv2_100 net please? I don‘t find it in https://github.com/rwightman/pytorch-image-models/tree/master/timm.

https://rwightman.github.io/pytorch-image-models/

This is what you need, please check

bixiwen commented 3 years ago

@quocnhat, thanks a lot

bixiwen commented 3 years ago

mobilenetv3 has layers channel of 16, 24, 40, 48 ... How can we return_layers? Because It requires a double size of layers (16 ->32 -> 64 ,...) any hind would be appreciated.

@quocnhat, can you share how to solve this problem in mobilenetv2_100 net?

quocnhat commented 3 years ago

In file retinaface.py, replace in_channels_list to the three input FPN size of your model. for ex: in_channels_list = [ 16, 24, 40]

noahzhy commented 2 years ago

Here are some improvements:

  • mobile0.25: change output channel from 64-> 128 increases > 1 AP for each group of data. Easy Val AP: 0.9173621803802692 Medium Val AP: 0.8962414033919657 Hard Val AP: 0.7556337455094042 CPU 0.0959 GPU: 0.0044
  • mobilenetv3 Small 128 output channel: 6.6M Easy Val AP: 0.9322953274903119 Medium Val AP: 0.9116441066993648 Hard Val AP: 0.7799313057781694 CPU 0.0911 GPU 0.0080
  • shufflenetv2 2.2M Easy Val AP: 0.9165965195928714 Medium Val AP: 0.8900462910575522 Hard Val AP: 0.7416302903715322 CPU 0.1960 GPU 0.0153

The CPU speed of shufflenetv2 in your testing result was slower than expected. Did you test the ShuffleNetV2-0.25x or ShuffleNet-0.5x(g=3) on it?

quocnhat commented 2 years ago

Here are some improvements:

  • mobile0.25: change output channel from 64-> 128 increases > 1 AP for each group of data. Easy Val AP: 0.9173621803802692 Medium Val AP: 0.8962414033919657 Hard Val AP: 0.7556337455094042 CPU 0.0959 GPU: 0.0044
  • mobilenetv3 Small 128 output channel: 6.6M Easy Val AP: 0.9322953274903119 Medium Val AP: 0.9116441066993648 Hard Val AP: 0.7799313057781694 CPU 0.0911 GPU 0.0080
  • shufflenetv2 2.2M Easy Val AP: 0.9165965195928714 Medium Val AP: 0.8900462910575522 Hard Val AP: 0.7416302903715322 CPU 0.1960 GPU 0.0153

The CPU speed of shufflenetv2 in your testing result was slower than expected. Did you test the ShuffleNetV2-0.25x or ShuffleNet-0.5x(g=3) on it?

If I did not mention anything else, It means that the network is the same as the default. pls check it

luisfmnunes commented 2 years ago

@quocnhat, would you mind sharing your mobilenetv2_100 net please? I don‘t find it in https://github.com/rwightman/pytorch-image-models/tree/master/timm.

https://rwightman.github.io/pytorch-image-models/

This is what you need, please check

Thank you for the reference, but I'm still having trouble to use timm models, like setting cfg parameters (specialy return_layers) and such. Could you please give more info on that? Thank you

Edit: I managed to use the parameter features_only to obtain the FPN, Thank you for the tips!