deepinsight / insightface

State-of-the-art 2D and 3D Face Analysis Project
https://insightface.ai
22.74k stars 5.33k forks source link

SCRFD issues #1518

Open SthPhoenix opened 3 years ago

SthPhoenix commented 3 years ago

Hi! I'm testing your new SCRFD face detector and have noticed some issues with onnx inference code and network outputs:

  1. In scrfd.py line 275 you are filtering bboxes, but later at line 278 you return det, so max_num parameter have no effect and may cause exceptions.

  2. Later at line 335 you are calling detector without providing input shape, which wont work with model having dynamic shape. However it won't be an issue when called from face_analysis.py

  3. I have noticed that detector returns very low scores or even fails on faces occupying >40% of image, it's especially visible for square shaped images, when there can't be provided additional padding during resize process. Also I have noticed that in such cases accuracy increases when lowering detection size (i.e. 480x480), and decreases when increasing it (i.e 1024x1024). Here is an example of detection at 640x640 scale: 1 Original image size is 1200x1200. As you can see when detection is run with resize to 640x640 score is 0.38 For 480x480 score is 0.86, and for 736x736 score is 0.07. Same behavior is noticed for both scrfd_10g_bnkps and scrfd_2.5g_bnkps models. In some cases it might be fixed by adding fixed padding around image, but it might lead to decreased accuracy for other image types, so it can't be applied by default.

BTW: Thanks for your great work!

nttstar commented 3 years ago

Hi @SthPhoenix, thanks for your attention.

  1. Just fixed.
  2. For models that support dynamic input, you should pass the input_size param for detect() method.
  3. It actually depends on the anchor design. But generally, it's good for 640 input size. You can try more pictures.
nttstar commented 3 years ago

BTW: if you're in a single-face situation, input size of 384/256(even 128) without padding is recommended.

SthPhoenix commented 3 years ago
  1. That's great! Thanks!
  2. Yes, I've just meant that example code will throw exception out of box, though easily fixable.
  3. New detector works just great for other image types, beside the ones with large faces.

BTW: if you're in a single-face situation, input size of 384/256(even 128) without padding is recommended.

I'm developing face recognition REST API based on InsightFace models and TensorRT inference backend. The problem comes with unconstrained heterogeneous input, like random photo galleries or images provided by users, where optimal settings can't be set in advance. Original Retinaface detector works great in such scenario, but it has a bit lower accuracy and much lower speed than your new SCRFD detector.

It looks like that new detector was mostly optimized for small faces and is a bit undertrained for large faces. Can it be somehow fixed during training or is it a design flaw?

nttstar commented 3 years ago

My suggestion( if you want to use the pre-trained models ): Option 1: Use 512 input size Option 2: Use a combination of 640 and 256 inputs with some engineering tricks, which is only 16% more flops than the single 640 input.

SthPhoenix commented 3 years ago

Thanks! I was investigating these options yesterday, option 2 is more promising but more logically complicated. In a long term retraining seems a better solution, could you give any hints on what parameters could be tuned?

SthPhoenix commented 3 years ago

Little update: this bug seems to be related only to *_bnkps models. When using models without key points detection work as expected

SthPhoenix commented 3 years ago
@nttstar , I have retrained scrfd_2.5g_bnkps, with batch norm replaced with group norm (new model should be called scrfd_2.5g_gnkps i think), just like for scrfd_2.5g model, I achieved following WiderFace AP: Model Easy Medium Hard
scrfd_2.5g_bnkps 93.80 92.02 77.13
scrfd_2.5g_gnkps 93.57 91.70 76.08

As you can see this config gives a small accuracy decrease, while completely solves problem with large faces. For above example I'm getting score around 0.7

nttstar commented 3 years ago

@SthPhoenix Thanks! Did you make the feature maps shared? BTW, you can open a new repo to place this new model so that I can give a link to it, if you want.

SthPhoenix commented 3 years ago

@SthPhoenix Thanks! Did you make the feature maps shared?

No, shared feature maps seems to reduce accuracy more noticably.

BTW, you can open a new repo to place this new model so that I can give a link to it, if you want.

I'm training scrfd_10g_gnkps right now, both models will be included in InsightFace-REST repo, though it would be great if you mention it ) Also I can make a pull request with updated configs for 0.5, 2.5, 10 and 34g models and dockerfile for training scrfd in docker.

xxxpsyduck commented 3 years ago

@SthPhoenix So you still using widerface, only change the config?

SthPhoenix commented 3 years ago

@SthPhoenix So you still using widerface, only change the config?

Yes, just it.

nttstar commented 3 years ago

@SthPhoenix Thanks! Did you make the feature maps shared?

No, shared feature maps seems to reduce accuracy more noticably.

BTW, you can open a new repo to place this new model so that I can give a link to it, if you want.

I'm training scrfd_10g_gnkps right now, both models will be included in InsightFace-REST repo, though it would be great if you mention it ) Also I can make a pull request with updated configs for 0.5, 2.5, 10 and 34g models and dockerfile for training scrfd in docker.

No, shared feature maps seems to reduce accuracy more noticably. Have you tested it? How about the mAP?

SthPhoenix commented 3 years ago

I have tested it by modifying scrfd_500m.py(as it could be faster trained) as follows:

norm_cfg=dict(type='GN', num_groups=16, requires_grad=True),
cls_reg_share=True,
strides_share=True,

After 640 epoch I've got following mAP: 0.881, 0.851, 0.619 Which is much lower than results you reported for same model without KPS, and even newer scrfd_500m_bnkps you published yesterday.

So I have trained scrfd_2.5g_gnkps model and training scrfd_10g_gnkps model now with following config:

norm_cfg=dict(type='GN', num_groups=16, requires_grad=True),
cls_reg_share=True,
strides_share=False,

BTW, scrfd_500m_gnkps model had no improvement on large faces, though I'm not sure if it's connected to strides_share=True, I'll try retraining this model and checking again.

nttstar commented 3 years ago

Shared feature map should be better by using GN, from my experiments of resnet based backbone.

SthPhoenix commented 3 years ago

Shared feature map should be better by using GN, from my experiments of resnet based backbone.

Hmmm, I'll check it on other models, thanks!

SthPhoenix commented 3 years ago

I have released retrained models at my repo. Models accuracy on WiderFace benchmark:

Model Easy Medium Hard
scrfd_10g_gnkps 95.51 94.12 82.14
scrfd_2.5g_gnkps 93.57 91.70 76.08
scrfd_500m_gnkps 88.70 86.11 63.57

All models were trained with following settings:

norm_cfg=dict(type='GN', num_groups=16, requires_grad=True),
cls_reg_share=True,
strides_share=False,

Model scrfd_10g_gnkps was trained up to 720 epoch, for some reason it gives best results at this checkpoint, though all other models begins to degrade after 640 epoch.

czzbb commented 3 years ago

@nttstar , I have retrained scrfd_2.5g_bnkps, with batch norm replaced with group norm (new model should be called scrfd_2.5g_gnkps i think), just like for scrfd_2.5g model, I achieved following WiderFace AP:

Model Easy Medium Hard scrfd_2.5g_bnkps 93.80 92.02 77.13 scrfd_2.5g_gnkps 93.57 91.70 76.08 As you can see this config gives a small accuracy decrease, while completely solves problem with large faces. For above example I'm getting score around 0.7

Hi,I just run the bash CUDA_VISIBLE_DEVICES="0,1,2,3" PORT=29701 bash ./tools/dist_train.sh ./configs/scrfd/scrfd_2.5g.py 4 and only achieve 62.4ap. Did anything I miss?

SthPhoenix commented 3 years ago

Hi @czzbb ! My config was based on scrfd_2.5g_bnkps.py modified according to my previous post.

czzbb commented 3 years ago

Hi @czzbb ! My config was based on scrfd_2.5g_bnkps.py modified according to my previous post.

Hi, I can't reproduce the official results(I just got 62.4, while it should be 77 ap). So I wonder is there anything I ignore. I download the datasets and annoations, and then directly run CUDA_VISIBLE_DEVICES="0,1,2,3" PORT=29701 bash ./tools/dist_train.sh ./configs/scrfd/scrfd_2.5g.py 4

SthPhoenix commented 3 years ago

Hi, I can't reproduce the official results(I just got 62.4, while it should be 77 ap). So I wonder is there anything I ignore. I download the datasets and annoations, and then directly run CUDA_VISIBLE_DEVICES="0,1,2,3" PORT=29701 bash ./tools/dist_train.sh ./configs/scrfd/scrfd_2.5g.py 4

If you are using original configs you should get mAP close to published values without issues. Have you tested mAP using evaluation script or this is mAP logged during training? You should refer to mAP outputted by evaluation script.

czzbb commented 3 years ago

Hi, I can't reproduce the official results(I just got 62.4, while it should be 77 ap). So I wonder is there anything I ignore. I download the datasets and annoations, and then directly run CUDA_VISIBLE_DEVICES="0,1,2,3" PORT=29701 bash ./tools/dist_train.sh ./configs/scrfd/scrfd_2.5g.py 4

If you are using original configs you should get mAP close to published values without issues. Have you tested mAP using evaluation script or this is mAP logged during training? You should refer to mAP outputted by evaluation script.

Thanks a lot! I got this 62.4 ap during training. But I get 77 ap using the evaluation script. Can't imagine the difference could be so huge.

tyxsspa commented 2 years ago

@nttstar @SthPhoenix Hi can you explain why GN works for big face? I retrain scrfd_10g_bnkps with specific big face(occupying >80% of image) augumentation and add big face ~20% in each batch, the model can handle big face detect, then i retrain scrfd_34g_bnkps, it works bad ,all big faces not detected. Then i change to scrfd_34g_gnkps, it works, but GN not supported by tensorRT(@SthPhoenix can you tell me how to transform GN onnx models to RT models?)

SthPhoenix commented 2 years ago

Then i change to scrfd_34g_gnkps, it works, but GN not supported by tensorRT(@SthPhoenix can you tell me how to transform GN onnx models to RT models?)

In TensorRT you should load optional layers support, for Python TRT API you should just add this line just after import:

import tensorrt as trt
trt.init_libnvinfer_plugins(None, "")
markdchoung commented 1 year ago

Hi, @tuoyuxiang @SthPhoenix

I followed the guide above, but the model performance cannot be reproduced. The model performed well on widerface validation dataset, but it performed poorly on images with large faces.

Did you train with the following configuration?

norm_cfg=dict(type='GN', num_groups=16, requires_grad=True), cls_reg_share=True, strides_share=False,

Did you train the model without any other additional methods? Then, the trained model perform well on large faces? How many gpu did you use? The total batch size will vary depending on the number of gpu, could this also affect the performance? I used 8 gpus for training.

tyxsspa commented 1 year ago

Hi, @tuoyuxiang @SthPhoenix

I followed the guide above, but the model performance cannot be reproduced. The model performed well on widerface validation dataset, but it performed poorly on images with large faces.

Did you train with the following configuration?

norm_cfg=dict(type='GN', num_groups=16, requires_grad=True), cls_reg_share=True, strides_share=False,

Did you train the model without any other additional methods? Then, the trained model perform well on large faces? How many gpu did you use? The total batch size will vary depending on the number of gpu, could this also affect the performance? I used 8 gpus for training.

I use this to improve large faces, you can try it: https://modelscope.cn/models/damo/cv_resnet_facedetection_scrfd10gkps/summary

JackLin-Authme commented 1 year ago

Hi, there.

I found that using default scales augmentation (range = [0.3, 0.45, ..., 2.0]) for training process could make tons of tiny faces be negative samples by using ATSS, but in widerface evaluation, these tiny faces are important on hard protocol evaluation.

截圖 2023-04-20 11 35 19

Base on ATSS algorithm, one step is to select anchors inside gt boxes. These tiny faces could not be included by selected anchors or have little anchors to be predicted.

截圖 2023-04-20 11 23 02

Does anyone have this question and how do you explain and solve it?

By the way, I solve large face problem by replacing SGD optimizer with AdamW.

NahidEbrahimian commented 8 months ago

dockerfile for training scrfd in docker.

@SthPhoenix Can you share your docker file with me for training scrfd in docker?

I made one docker file, but in the training process in docker and in specific epochs, the training stopped with an error.

Jack-Lin-NTU commented 7 months ago

Hi, there.

I found that using default scales augmentation (range = [0.3, 0.45, ..., 2.0]) for training process could make tons of tiny faces be negative samples by using ATSS, but in widerface evaluation, these tiny faces are important on hard protocol evaluation.

截圖 2023-04-20 11 35 19

Base on ATSS algorithm, one step is to select anchors inside gt boxes. These tiny faces could not be included by selected anchors or have little anchors to be predicted.

截圖 2023-04-20 11 23 02

Does anyone have this question and how do you explain and solve it?

By the way, I solve large face problem by replacing SGD optimizer with AdamW.

I read SCRFD paper, the authors said faces smaller than 4 x 4 pixel would be dropped, but the source code is not added this constrain? is it right?