Open jackwei86 opened 2 months ago
I am looking for the same! I need to run SAM2 on a live video stream but I am getting just ~1.5 fps using a nvidia T4.
There is https://github.com/NVIDIA-AI-IOT/nanosam and I am wondering what the differences between creation of encoder/decoder .engines would be for sam2 vs sam1
There is https://github.com/NVIDIA-AI-IOT/nanosam and I am wondering what the differences between creation of encoder/decoder .engines would be for sam2 vs sam1
sam1 can only be used for images while sam2 support images and videos. I found some informations from Nvidia forum https://forums.developer.nvidia.com/t/unable-to-install-sam2-on-orin-nano/302009/19. NVIDIA maybe optimize SAM2 through TAO / DeepStream in the future. And I also found this FasterTransformer . but the SAM2 model is not included .
Thanks for your reply. I really appreciated it.
I am looking for the same! I need to run SAM2 on a live video stream but I am getting just ~1.5 fps using a nvidia T4.
The default/unoptimized performance is slow as sam2 is a large model.Hoping for Nvidia will optimize the mode soon. u can follow the link: https://forums.developer.nvidia.com/t/unable-to-install-sam2-on-orin-nano/302009/17
From what I understand, it's possible to convert from onnx to tensorRT. So it might be worth checking out the onnx variants of SAM (see discussion in issue #186), such as this repo: https://github.com/axinc-ai/segment-anything-2/tree/onnx
It's also possible to get a decent speedup (~4x, possibly bigger than using tensorRT on it's own) by reducing the resolution to 512px (see issue #257). Obviously this reduces the segmentation quality, but not as much as expected (at least imo).
From what I understand, it's possible to convert from onnx to tensorRT. So it might be worth checking out the onnx variants of SAM (see discussion in issue #186), such as this repo: https://github.com/axinc-ai/segment-anything-2/tree/onnx
It's also possible to get a decent speedup (~4x, possibly bigger than using tensorRT on it's own) by reducing the resolution to 512px (see issue #257). Obviously this reduces the segmentation quality, but not as much as expected (at least imo).
Thanks for your reply.I have read the post and got the onnx file. I will try to convert the onnx to tensorRT.
我实现了导出所有的onnx文件,并且通过c++ onnxruntime 实现video的推理。 I implemented the export of all onnx files and realized the inference of video through c++ onnx runtime。 inference with C++ :https://github.com/Aimol-l/OrtInference export onnx file:https://github.com/Aimol-l/SAM2Export
我实现了导出所有的onnx文件,并且通过c++ onnxruntime 实现video的推理。 I implemented the export of all onnx files and realized the inference of video through c++ onnx runtime。 inference with C++ :https://github.com/Aimol-l/OrtInference export onnx file:https://github.com/Aimol-l/SAM2Export
后续有使用tensorRT进一步优化提升FPS的计划吗?目前我用RTX4060,只能达到10 FPS (sam2_hiera_small.pt image_size:1024)不知道你那边的推理硬件是什么型号,使用 c++ onnxruntime 推理能达到多少FPS。
我实现了导出所有的onnx文件,并且通过c++ onnxruntime 实现video的推理。 I implemented the export of all onnx files and realized the inference of video through c++ onnx runtime。 inference with C++ :https://github.com/Aimol-l/OrtInference export onnx file:https://github.com/Aimol-l/SAM2Export
后续有使用tensorRT进一步优化提升FPS的计划吗?目前我用RTX4060,只能达到10 FPS (sam2_hiera_small.pt image_size:1024)不知道你那边的推理硬件是什么型号,使用 c++ onnxruntime 推理能达到多少FPS。
4070 ti super, cuda推理,也是这个速度。https://www.bilibili.com/video/BV1iV4sesEak/?vd_source=53feb81d41ae94687975addeae931a0a
我实现了导出所有的onnx文件,并且通过c++ onnxruntime 实现video的推理。 I implemented the export of all onnx files and realized the inference of video through c++ onnx runtime。 inference with C++ :https://github.com/Aimol-l/OrtInference export onnx file:https://github.com/Aimol-l/SAM2Export
后续有使用tensorRT进一步优化提升FPS的计划吗?目前我用RTX4060,只能达到10 FPS (sam2_hiera_small.pt image_size:1024)不知道你那边的推理硬件是什么型号,使用 c++ onnxruntime 推理能达到多少FPS。
4070 ti super, cuda推理,也是这个速度。https://www.bilibili.com/video/BV1iV4sesEak/?vd_source=53feb81d41ae94687975addeae931a0a
谢谢,B站已关注。我看你视频中有很详细的 Sam2用到的transformer模型 encode decode attention的介绍,正是我需要的。
I`m not quite familiar with the Transformer model. There are more steps to do than other model with the Encoder and Decoder. Such as the last encoder block output needs to be as the input for the next encoder block.etc. Does anyone has an example of using C++ with Nvidia TensorRT?
To handle Run Segment Anything Model 2 on a live video stream ,i have follow the link segment-anything-2-real-time. I need to implement the functionality using C++ with Nvidia TensorRT for low latency.