trt batch infer results error

SthPhoenix / InsightFace-REST

InsightFace REST API for easy deployment of face recognition services with TensorRT in Docker.

Apache License 2.0

503 stars 117 forks source link

trt batch infer results error #91

Closed yuxianmin closed 2 years ago

yuxianmin commented 2 years ago

Use trt backend, max_batch_size 2 for test. When I perform inference with a single image, the return result is correct, but if two different images are input together, the second image will have wrong detections; but if the two images are same, the results are correct

What could be the reason for this? thanks

some errors for example, use scrfd_500m_bnkps_640_640_batch2.plan

when use [ "test_images/Stallone.jpg", "test_images/mask.jpg" ]

when use [ "test_images/mask.jpg", "test_images/lumia.jpg", ]

SthPhoenix commented 2 years ago

Hi! that's interesting, could you please run the same test with yolov5s-face model to narrow down the issue?

SthPhoenix commented 2 years ago

Should be working as expected now ) Absolutely dumb mistake with array offsets )

yuxianmin commented 2 years ago

Thanks for replying so quickly. In addition to this offset problem, there may be another problem that needs to be changed. When batch input for test, I found that there is a problem with the first image. After debug, self.score_list/bbox_list/kpss_list will be overwritten by the following image, resulting in an error in the previous image. After reset, the results is ok.

SthPhoenix commented 2 years ago

Actually those arrays aren't reset intentionally, I found that reallocating them might noticably impact performance, so I have allocated memory during initialization and then just assigning new values.

What kind of errors you have with first image? I'll investigate it more thoroughly.

yuxianmin commented 2 years ago

If not reset, because different images share these arrays, when the offset starts from 0 during the batch cycle, the previous detection data will be overwritten by the latter

for two images, batch=2

[ "test_images/mask.jpg", "test_images/lumia.jpg", ]

errors in first image:

SthPhoenix commented 2 years ago

Yes, I can reproduce this behavior too. Array slices are passed to output by reference instead of value, I'll fix it shortly. Thanks for pointing out the issue!

yuxianmin commented 2 years ago

By the way, ask a question. When using batch inference (eg: scrfd_500m_bnkps.onnx) under gpu with onnxruntime-gpu, the batch inference time(only the inference time, not include pre and post process) is basically not improved. Have you encountered it? The model is also running on the gpu. thanks

SthPhoenix commented 2 years ago

By the way, ask a question. When using batch inference (eg: scrfd_500m_bnkps.onnx) under gpu with onnxruntime-gpu, the batch inference time(only the inference time, not include pre and post process) is basically not improved. Have you encountered it? The model is also running on the gpu. thanks

I haven't added batch inference for onnxruntime, so it should be processing images one by one.

I remember there were some issues with batch inference enabled and using onnxruntime on CPU, though I can't recall if there were any issues on GPU.

yuxianmin commented 2 years ago

okay, thanks for your reply!