Open dexception opened 2 years ago
make sure you are using onnxruntime-gpu
@nttstar I'm facing the same problem with scrfd_500m.onnx my onnxruntime-gpu is on checked by:
python -c "import onnxruntime as ort; print(ort.get_device())"
>>> GPU
performed 10 runs:
all cost: 15.78
all cost: 11.540999999999999
all cost: 9.548
all cost: 9.395000000000001
all cost: 11.373
all cost: 10.758000000000001
all cost: 10.706
all cost: 11.953
all cost: 12.467
all cost: 12.462000000000002
my GPU is RTX 3080, you claim to have 3.6ms
on AMD Ryzen 9 3950X
To answer my question above I came upon https://github.com/deepinsight/insightface/tree/master/python-package and used this instead
pip install -U insightface
after doing:
pip install onnxruntime-gpu
I ran https://github.com/deepinsight/insightface/blob/master/python-package/insightface/model_zoo/scrfd.py and changed
detector = SCRFD(model_file='/root/.insightface/models/buffalo_m/det_2.5g.onnx')
detector.prepare(0) # original ctx_id -1 is for CPU, 0 is for GPU id
and also changed https://github.com/deepinsight/insightface/blob/06897de50e327e01a33582955d5cb4222d0e67b5/python-package/insightface/model_zoo/scrfd.py#L330 to
bboxes, kpss = detector.detect(img, input_size = (640, 640))
the results was:
all cost: 733.7719999999999 (#gpu cold run )
all cost: 5.401999999999999 (#run1)
all cost: 4.316000000000001 (#run2)
Here I used SCRFD_2.5G getting 4.31ms, which is reasonable on my RTX 3080
@joytsay Yes and this 4.31ms includes post-processing.
@joytsay did you solve this problem?the inference time on scrfd_500m.onnx is way off the paper
@Monkey-D-Luffy-star
yes here is my infer time on RTX 3080:
p.s. scrfd_500m.onnx
is SCRFD_0.5GF
Model | Backbone | Input | RTX 3080 Linux |
---|---|---|---|
CenterFace | MobileNetV2 | 800x800 | 8.55ms |
RetinaFace | MobileNet0.25 | 640x640 | 22.19ms |
SCRFD_0.5GF | Depth-wise Conv | 640x640 | 3.625ms |
SCRFD_2.5GF | Basic Res | 640x640 | 4.239ms |
SCRFD_10GF | Basic Res | 640x640 | 5.875ms |
@joytsay thx, How did you solve this problem?
As mentioned above, i used the python-package
withonnxruntime-gpu
installed
my docker environment is:
docker pull nvcr.io/nvidia/mxnet:21.09-py3
( since I wanted to benchmark RetinaFace in mxnet environment )
and after starting container with:
docker run --gpus all --shm-size=8g -it -v $PWD:/insight-dir nvcr.io/nvidia/mxnet:21.09-py3 bash
i did the following steps in my last quote within the mxnet container:
To answer my question above I came upon https://github.com/deepinsight/insightface/tree/master/python-package and used this instead
pip install -U insightface
after doing:
pip install onnxruntime-gpu
I ran https://github.com/deepinsight/insightface/blob/master/python-package/insightface/model_zoo/scrfd.py and changed
to
detector = SCRFD(model_file='/root/.insightface/models/buffalo_m/det_2.5g.onnx') detector.prepare(0) # original ctx_id -1 is for CPU, 0 is for GPU id
and also changed
to
bboxes, kpss = detector.detect(img, input_size = (640, 640))
the results was:
all cost: 733.7719999999999 (#gpu cold run ) all cost: 5.401999999999999 (#run1) all cost: 4.316000000000001 (#run2)
Here I used SCRFD_2.5G getting 4.31ms, which is reasonable on my RTX 3080
@joytsay Thank you for your answer. I would like to know the reason for your previous mistake , like this.
all cost: 15.78 all cost: 11.540999999999999 all cost: 9.548 all cost: 9.395000000000001 all cost: 11.373 all cost: 10.758000000000001 all cost: 10.706 all cost: 11.953 all cost: 12.467 all cost: 12.462000000000002
This is due to using conda directly in ubuntu. Somehow onnx runtime can't trigger gpu even when it says it does:
python -c "import onnxruntime as ort; print(ort.get_device())"
>>> GPU
I ended up using docker instead.
Well, maybe I have the same problem as you, I'll try it. thx.
@joytsay Hi,bro,As you said, onnxruntime can't trigger the GPU, but I found this to be due to onnxruntime version issues.I installed too high onnxruntime version, when I lowered the version, I successfully ran on GPU (now onnxruntime version is 1.4), but the result is still a little different from yours, and fluctuated greatly, do you think this is caused by the version problem?Here are the scrfd2.5g.onnx results on 2080Ti
all cost: 9.543 all cost: 9.399 all cost: 9.32 all cost: 13.374 all cost: 25.564 all cost: 26.359 all cost: 27.944 all cost: 27.301
@joytsay @nttstar My GPU is RTX 3080, and my cpu is Intel(R) Xeon(R) Silver 4310 CPU @ 2.10GHz. The model file is scrfd_10g_bnkps.onnx. I ran https://github.com/deepinsight/insightface/blob/master/python-package/insightface/model_zoo/scrfd.py, and the all cost time is about 10ms. The time mentioned in paper is 5ms. Then I tested the time of the step in the forward function of the class SCRFD: "net_outs = self.session.run(self.output_names, {self.input_name : blob})", and I got the time of 5ms around.
So my question is : did the inference time mentioned in the paper include the post-processing time? Or my CPU performance is lower? Or some other reasons?
@jiangxiangchuan I obtain a similar result on 3090. I think the reported time (i.e., 4.9ms) just contains the session.run function. The time for data pre-processing is not included.
As you can see i ran the loop 10 times to confirm the result.
The results are way off the time given by you. Am i missing something ?
Note: Onnx file exported from the model.pth file given at the following link. https://1drv.ms/u/s!AswpsDO2toNKqyYWxScdiTITY4TQ?e=DjXof9