Getting Inference time for SCRFD_500M => 31.227 millisecond vs 3.6 millisecond claimed ? how ?

dexception commented 2 years ago

As you can see i ran the loop 10 times to confirm the result.

The results are way off the time given by you. Am i missing something ?

Note: Onnx file exported from the model.pth file given at the following link. https://1drv.ms/u/s!AswpsDO2toNKqyYWxScdiTITY4TQ?e=DjXof9

nttstar commented 2 years ago

make sure you are using onnxruntime-gpu

joytsay commented 2 years ago

@nttstar I'm facing the same problem with scrfd_500m.onnx my onnxruntime-gpu is on checked by:

python -c "import onnxruntime as ort; print(ort.get_device())"
>>> GPU

performed 10 runs:

all cost: 15.78
all cost: 11.540999999999999
all cost: 9.548
all cost: 9.395000000000001
all cost: 11.373
all cost: 10.758000000000001
all cost: 10.706
all cost: 11.953
all cost: 12.467
all cost: 12.462000000000002

my GPU is RTX 3080, you claim to have 3.6ms on AMD Ryzen 9 3950X

joytsay commented 2 years ago

To answer my question above I came upon https://github.com/deepinsight/insightface/tree/master/python-package and used this instead

pip install -U insightface

after doing:

pip install onnxruntime-gpu

I ran https://github.com/deepinsight/insightface/blob/master/python-package/insightface/model_zoo/scrfd.py and changed

https://github.com/deepinsight/insightface/blob/06897de50e327e01a33582955d5cb4222d0e67b5/python-package/insightface/model_zoo/scrfd.py#L321 to

detector = SCRFD(model_file='/root/.insightface/models/buffalo_m/det_2.5g.onnx')
detector.prepare(0) # original ctx_id -1 is for CPU, 0 is for GPU id

and also changed https://github.com/deepinsight/insightface/blob/06897de50e327e01a33582955d5cb4222d0e67b5/python-package/insightface/model_zoo/scrfd.py#L330 to

bboxes, kpss = detector.detect(img, input_size = (640, 640))

the results was:

all cost: 733.7719999999999 (#gpu cold run )
all cost: 5.401999999999999 (#run1)
all cost: 4.316000000000001 (#run2)

Here I used SCRFD_2.5G getting 4.31ms, which is reasonable on my RTX 3080

nttstar commented 2 years ago

@joytsay Yes and this 4.31ms includes post-processing.

QAQEthan commented 2 years ago

@joytsay did you solve this problem?the inference time on scrfd_500m.onnx is way off the paper

joytsay commented 2 years ago

@Monkey-D-Luffy-star yes here is my infer time on RTX 3080: p.s. scrfd_500m.onnx is SCRFD_0.5GF

Model	Backbone	Input	RTX 3080 Linux
CenterFace	MobileNetV2	800x800	8.55ms
RetinaFace	MobileNet0.25	640x640	22.19ms
SCRFD_0.5GF	Depth-wise Conv	640x640	3.625ms
SCRFD_2.5GF	Basic Res	640x640	4.239ms
SCRFD_10GF	Basic Res	640x640	5.875ms

QAQEthan commented 2 years ago

@joytsay thx, How did you solve this problem?

joytsay commented 2 years ago

As mentioned above, i used the python-package withonnxruntime-gpu installed my docker environment is:

docker pull nvcr.io/nvidia/mxnet:21.09-py3

( since I wanted to benchmark RetinaFace in mxnet environment )

and after starting container with:

docker run --gpus all --shm-size=8g -it -v $PWD:/insight-dir nvcr.io/nvidia/mxnet:21.09-py3 bash

i did the following steps in my last quote within the mxnet container:

To answer my question above I came upon https://github.com/deepinsight/insightface/tree/master/python-package and used this instead
pip install -U insightface
after doing:
pip install onnxruntime-gpu
I ran https://github.com/deepinsight/insightface/blob/master/python-package/insightface/model_zoo/scrfd.py and changed

https://github.com/deepinsight/insightface/blob/06897de50e327e01a33582955d5cb4222d0e67b5/python-package/insightface/model_zoo/scrfd.py#L321

to
detector = SCRFD(model_file='/root/.insightface/models/buffalo_m/det_2.5g.onnx')
detector.prepare(0) # original ctx_id -1 is for CPU, 0 is for GPU id 
and also changed

https://github.com/deepinsight/insightface/blob/06897de50e327e01a33582955d5cb4222d0e67b5/python-package/insightface/model_zoo/scrfd.py#L330

to
bboxes, kpss = detector.detect(img, input_size = (640, 640))
the results was:
all cost: 733.7719999999999 (#gpu cold run )
all cost: 5.401999999999999 (#run1)
all cost: 4.316000000000001 (#run2)
Here I used SCRFD_2.5G getting 4.31ms, which is reasonable on my RTX 3080

QAQEthan commented 2 years ago

@joytsay Thank you for your answer. I would like to know the reason for your previous mistake , like this.

all cost: 15.78 all cost: 11.540999999999999 all cost: 9.548 all cost: 9.395000000000001 all cost: 11.373 all cost: 10.758000000000001 all cost: 10.706 all cost: 11.953 all cost: 12.467 all cost: 12.462000000000002

joytsay commented 2 years ago

This is due to using conda directly in ubuntu. Somehow onnx runtime can't trigger gpu even when it says it does:


python -c "import onnxruntime as ort; print(ort.get_device())"
>>> GPU

I ended up using docker instead.

QAQEthan commented 2 years ago

Well, maybe I have the same problem as you, I'll try it. thx.

QAQEthan commented 2 years ago

@joytsay Hi,bro,As you said, onnxruntime can't trigger the GPU, but I found this to be due to onnxruntime version issues.I installed too high onnxruntime version, when I lowered the version, I successfully ran on GPU (now onnxruntime version is 1.4), but the result is still a little different from yours, and fluctuated greatly, do you think this is caused by the version problem?Here are the scrfd2.5g.onnx results on 2080Ti

all cost: 9.543 all cost: 9.399 all cost: 9.32 all cost: 13.374 all cost: 25.564 all cost: 26.359 all cost: 27.944 all cost: 27.301

jiangxiangchuan commented 2 years ago

@joytsay @nttstar My GPU is RTX 3080, and my cpu is Intel(R) Xeon(R) Silver 4310 CPU @ 2.10GHz. The model file is scrfd_10g_bnkps.onnx. I ran https://github.com/deepinsight/insightface/blob/master/python-package/insightface/model_zoo/scrfd.py, and the all cost time is about 10ms. The time mentioned in paper is 5ms. Then I tested the time of the step in the forward function of the class SCRFD: "net_outs = self.session.run(self.output_names, {self.input_name : blob})", and I got the time of 5ms around.
So my question is : did the inference time mentioned in the paper include the post-processing time? Or my CPU performance is lower? Or some other reasons?

wenzhengzeng commented 1 year ago

@jiangxiangchuan I obtain a similar result on 3090. I think the reported time (i.e., 4.9ms) just contains the session.run function. The time for data pre-processing is not included.

deepinsight / insightface

Getting Inference time for SCRFD_500M => 31.227 millisecond vs 3.6 millisecond claimed ? how ? #1761