Closed HaoqianSong closed 1 year ago
You may think you want to measure FPS of native pytorch inference speed, but in fact you don't.
Here's why: Pytorch eager execution mode is great for debugging and ease of understanding how model works. However it comes at the cost of increased inference latency
Pytorch native forward() does not employ any runtime optimization and layers fusion
Neither it optimize memory usage by keeping intermediate buffers to reduce fragmentation of vram
Even torch tracing would give a better performance that plain pytorch model.
Measuring FPS of inference under these conditions would give you very biased results. If your goal is to get the maximum performance the recommended way to use tensorrt or onnxruntime and not plain pytorch model.
OK, thanks! How can I output the delay time of model inference? For example, when YOLO reasoning: with dt[0]: im = torch.from_numpy(im).to(model.device) im = im.half() if model.fp16 else im.float() # uint8 to fp16/32 im /= 255 # 0 - 255 to 0.0 - 1.0 if len(im.shape) == 3: im = im[None] # expand for batch dim
# Inference
with dt[1]:
visualize = increment_path(save_dir / Path(path).stem, mkdir=True) if visualize else False
pred = model(im, augment=augment, visualize=visualize)
# NMS
with dt[2]:
pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det)
LOGGER.info(f"{s}{'' if len(det) else '(no detections), '}{dt[1].dt 1E3:.1f}ms") t = tuple(x.t / seen 1E3 for x in dt) # speeds per image LOGGER.info(f'Speed: %.1fms pre-process, %.1fms inference, %.1fms NMS per image at shape {(1, 3, *imgsz)}' % t)
The output:image 97/98 D:\DetectionAlgorithm\DataSet\YOLO\YOLOinput\images\test\931898198569996288_0.jpg: 384x640 4 persons, 26.0ms image 98/98 D:\DetectionAlgorithm\DataSet\YOLO\YOLOinput\images\test\96.jpg: 448x640 2 persons, 26.9ms Speed: 0.5ms pre-process, 30.0ms inference, 1.8ms NMS per image at shape (1, 3, 640, 640)
Please follow our docs describing how to properly benchmark the model https://docs.deci.ai/super-gradients/documentation/source/BenchmarkingYoloNAS.html
You may think you want to measure FPS of native pytorch inference speed, but in fact you don't.
Here's why: Pytorch eager execution mode is great for debugging and ease of understanding how model works. However it comes at the cost of increased inference latency
Pytorch native forward() does not employ any runtime optimization and layers fusion
Neither it optimize memory usage by keeping intermediate buffers to reduce fragmentation of vram
Even torch tracing would give a better performance that plain pytorch model.
Measuring FPS of inference under these conditions would give you very biased results. If your goal is to get the maximum performance the recommended way to use tensorrt or onnxruntime and not plain pytorch model.
I had a doubt then: did you guys use tensorrt/onnx for your benchmarks for other models? (yolov8, yolov5, yolov6, yolov7, etc?) for your comparisions at https://github.com/Deci-AI/super-gradients/raw/master/documentation/source/images/yolo_nas_frontier.png. Because if not, could they not potentially be faster than yoloNAS with tensorrt optimizations?
Have no doubt - we used TensorRT for GPU and OpenVino for CPU benchmarks
Thank you for the information :). I would love to try it on some images using TensorRT but it is proving difficult to do so.
🚀 Feature Request
How to output the inference time for each image during inference, or output the average time for inference of all images, so as to facilitate the calculation of FPS indicators.
Proposed Solution (Optional)
How to output the inference time for each image during inference, or output the average time for inference of all images, so as to facilitate the calculation of FPS indicators.