WARNING: Latest update may cause troubles with previously compiled Numba functions. If you met any errors concerning 'modules not found' Run following command in repo root to remove
__pycache__
:
find . | grep -E "(__pycache__|\.pyc$)" | sudo xargs rm -rf
This repository aims to provide convenient, easy deployable and scalable REST API for InsightFace face detection and recognition pipeline using FastAPI for serving and NVIDIA TensorRT for optimized inference.
Code is heavily based on API code in official DeepInsight InsightFace repository.
This repository provides source code for building face recognition REST API and converting models to ONNX and TensorRT using Docker.
SCRFD
detectors and PyTorch based recognition models (glintr100
,w600k_r50
, w600k_mbf
).SCRFD
postprocessing implementation.SCRFD
family only)Model | Auto download | Batch inference | Detection (ms) | Inference (ms) | GPU-Util (%) | Source | ONNX File |
---|---|---|---|---|---|---|---|
retinaface_r50_v1 | Yes* | 12.3 | 8.4 | 26 | official package | link | |
retinaface_mnet025_v1 | Yes* | 8.6 | 4.6 | 17 | official package | link | |
retinaface_mnet025_v2 | Yes* | 8.8 | 4.9 | 17 | official package | link | |
mnet_cov2 | Yes* | 8.7 | 4.6 | 18 | mnet_cov2 | link | |
centerface | Yes | 10.6 | 3.5 | 19 | Star-Clouds/CenterFace | link | |
scrfd_10g_bnkps | Yes* | Yes | 3.3 | 2 | 16 | SCRFD | link |
scrfd_2.5g_bnkps | Yes* | Yes | 2.2 | 1.1 | 13 | SCRFD | link |
scrfd_500m_bnkps | Yes* | Yes | 1.9 | 0.8 | 13 | SCRFD | link |
scrfd_10g_gnkps | Yes* | Yes | 3.3 | 2.2 | 17 | SCRFD** | link |
scrfd_2.5g_gnkps | Yes* | Yes | 2.3 | 1.2 | 14 | SCRFD** | link |
scrfd_500m_gnkps | Yes* | Yes | 2.1 | 1.3 | 14 | SCRFD** | link |
yolov5s-face | Yes* | Yes | yolov5-face | link | |||
yolov5m-face | Yes* | Yes | yolov5-face | link | |||
yolov5l-face | Yes* | Yes | yolov5-face | link |
Note: Performance metrics measured on NVIDIA RTX2080 SUPER + Intel Core i7-5820K (3.3Ghz * 6 cores) for
api/src/test_images/lumia.jpg
withforce_fp16=True
,det_batch_size=1
andmax_size=640,640
.Detection time include inference, pre- and postprocessing, but does not include image reading, decoding and resizing.
Note 2: SCRFD family models requires input image shape dividable by 32, i.e 640x640, 1024x768.
Model | Auto download | Batch inference | Inference b=1 (ms) | Inference b=64 (ms) | Source | ONNX File |
---|---|---|---|---|---|---|
arcface_r100_v1 | Yes* | Yes | 2.6 | 54.8 | official package | link |
r100-arcface-msfdrop75 | No | Yes | - | - | SubCenter-ArcFace | None |
r50-arcface-msfdrop75 | No | Yes | - | - | SubCenter-ArcFace | None |
glint360k_r100FC_1.0 | No | Yes | - | - | Partial-FC | None |
glint360k_r100FC_0.1 | No | Yes | - | - | Partial-FC | None |
glintr100 | Yes* | Yes | 2.6 | 54.7 | official package | link |
w600k_r50 | Yes* | Yes | 1.9 | 33.2 | official package | link |
w600k_mbf | Yes* | Yes | 0.7 | 9.9 | official package | link |
adaface_ir101_webface12m | Yes* | Yes | - | - | AdaFace repo | link |
Model | Auto download | Inference code | Source | ONNX File |
---|---|---|---|---|
genderage_v1 | Yes* | Yes | official package | link |
mask_detector | Yes* | Yes | Face-Mask-Detection | link |
mask_detector112 | Yes* | Yes | Face-Mask-Detection*** | link |
2d106det | No | No | coordinateReg | None |
*
- Models will be downloaded from Google Drive, which might be inaccessible in some regions like China.
**
- custom models retrained for this repo. Original SCRFD models have bug
(deepinsight/insightface#1518) with
detecting large faces occupying >40% of image. These models are retrained with Group Normalization instead of
Batch Normalization, which fixes bug, though at cost of some accuracy.
Models accuracy on WiderFace benchmark:
Model | Easy | Medium | Hard |
---|---|---|---|
scrfd_10g_gnkps | 95.51 | 94.12 | 82.14 |
scrfd_2.5g_gnkps | 93.57 | 91.70 | 76.08 |
scrfd_500m_gnkps | 88.70 | 86.11 | 63.57 |
***
- custom model retrained for 112x112 input size to remove excessive resize operations and
improve performance.
deploy_trt.sh
from repo's root, edit settings if needed.If you have multiple GPU's with enough GPU memory you can try running
multiple containers by editing n_gpu and n_workers parameters in
deploy_trt.sh
.
By default container is configured to build TRT engines without FP16
support, to enable it change value of force_fp16
to True
in
deploy_trt.sh
. Keep in mind, that your GPU should support fast FP16
inference (NVIDIA GPUs of RTX20xx series and above, or server GPUs like
TESLA P100, T4 etc. ).
Also if you want to test API in non-GPU environment you can run service
with deploy_cpu.sh
script. In this case ONNXRuntime will be used as
inference backend.
For pure MXNet based version, without TensorRT support you can check depreciated v0.5.0 branch
For example of API usage example please refer to demo_client.py code.
glintr100
recognition model is used genderage
model returns
wrong predictions.Since a lot of updates happened since last release version is updated straight to v0.7.0.0
Comparing to previous release (v0.6.2.0) this release brings improved performance for SCRFD based detectors.
Here is performance comparison on GPU Nvidia RTX 2080 Super
for scrfd_10g_gnkps
detector paired with
glintr100
recognition model (all tests are using src/api_trt/test_images/Stallone.jpg
, 1 face per image):
Num workers | Client threads | FPS v0.6.2.0 | FPS v0.7.0.0 | Speed-up |
---|---|---|---|---|
1 | 1 | 56 | 103 | 83.9% |
1 | 30 | 72 | 128 | 77.7% |
6 | 30 | 145 | 179 | 23.4% |
Additions:
Model Zoo:
w600k_r50
and w600k_mbf
scrfd
based models now supports batch dimension/Improvements:
lumia.jpg
example with
scrfd_10g_gnkps
and threshold = 0.3 (432 faces detected)).face_align.norm_crop
implementation with help of Numba and removal of unused computations.
(Cropping 432 faces from lumia.jpg
example tooks 45 ms. vs 205 ms.).Fixes:
REST-API
httpx
lib for retrieving images by urls instead of urllib3 (which caused
performance drop in multi-GPU environment under load due to excessive usage of opened sockets)REST-API
REST-API
scrfd_500m_bnkps
, scrfd_2.5g_bnkps
, scrfd_10g_bnkps
scrfd_500m_gnkps
, scrfd_2.5g_gnkps
, scrfd_10g_gnkps
glintr100
glintr100
and scrfd_10g_gnkps
REST-API
genderage
model.limit_faces
parameter
in extract
endpoint./multipart/draw_detections
endpoint, supporting image upload using multipart
form data.draw_detections
endpoints.extract
endpoint for debug and logging purposes.REST-API
'api_ver':'2'
in request body. In future versions this parameter
will be moved to path, like /v2/extract
, and will be default output
format.REST-API & conversion scripts:
REST-API & conversion scripts:
force_fp16
flag. Now model with FP16 precision
is build only when set to True
. Otherwise FP32 will be used even on
GPUs with fast FP16 support.REST-API:
embed_only
to /extract
endpoint. When set to true
input images are processed as face crops, omitting detection phase.
Expects 112x112 face crops.draw_landmarks
to /draw_detections
endpoint.REST-API:
/extract
endpointREST-API & conversion scripts:
glint360k_r100FC_1.0
and glint360k_r100FC_0.1
face recognition models.REST-API:
TensorRT:20.12-py3
.r50-arcface-msfdrop75
face recognition model.Conversion scripts:
REST-API:
Conversion scripts:
REST API:
Conversion scripts:
Conversion scripts:
models/mxnet/mnet_cov2
)REST API:
force_fp16
flag in
deploy_trt.sh
)Conversion scripts:
REST API:
src/api_trt
src/Dockerfile_trt
)deploy_trt.sh
TensorRT version contains MXNet and ONNXRuntime compiled for CPU for testing and conversion purposes.
Conversion scripts:
REST API: