facebookresearch / sam2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Apache License 2.0
12.14k stars 1.1k forks source link

Is it possible to use Nvidia TensorRT to accelerate SAM2 inference with C++ implement? #284

Open jackwei86 opened 2 months ago

jackwei86 commented 2 months ago

I`m not quite familiar with the Transformer model. There are more steps to do than other model with the Encoder and Decoder. Such as the last encoder block output needs to be as the input for the next encoder block.etc. Does anyone has an example of using C++ with Nvidia TensorRT?

To handle Run Segment Anything Model 2 on a live video stream ,i have follow the link segment-anything-2-real-time. I need to implement the functionality using C++ with Nvidia TensorRT for low latency.

Henistein commented 2 months ago

I am looking for the same! I need to run SAM2 on a live video stream but I am getting just ~1.5 fps using a nvidia T4.

free-soellingeraj commented 2 months ago

There is https://github.com/NVIDIA-AI-IOT/nanosam and I am wondering what the differences between creation of encoder/decoder .engines would be for sam2 vs sam1

jackwei86 commented 2 months ago

There is https://github.com/NVIDIA-AI-IOT/nanosam and I am wondering what the differences between creation of encoder/decoder .engines would be for sam2 vs sam1

sam1 can only be used for images while sam2 support images and videos. I found some informations from Nvidia forum https://forums.developer.nvidia.com/t/unable-to-install-sam2-on-orin-nano/302009/19. NVIDIA maybe optimize SAM2 through TAO / DeepStream in the future. And I also found this FasterTransformer . but the SAM2 model is not included .

Thanks for your reply. I really appreciated it.

jackwei86 commented 2 months ago

I am looking for the same! I need to run SAM2 on a live video stream but I am getting just ~1.5 fps using a nvidia T4.

The default/unoptimized performance is slow as sam2 is a large model.Hoping for Nvidia will optimize the mode soon. u can follow the link: https://forums.developer.nvidia.com/t/unable-to-install-sam2-on-orin-nano/302009/17

heyoeyo commented 2 months ago

From what I understand, it's possible to convert from onnx to tensorRT. So it might be worth checking out the onnx variants of SAM (see discussion in issue #186), such as this repo: https://github.com/axinc-ai/segment-anything-2/tree/onnx

It's also possible to get a decent speedup (~4x, possibly bigger than using tensorRT on it's own) by reducing the resolution to 512px (see issue #257). Obviously this reduces the segmentation quality, but not as much as expected (at least imo).

jackwei86 commented 1 month ago

From what I understand, it's possible to convert from onnx to tensorRT. So it might be worth checking out the onnx variants of SAM (see discussion in issue #186), such as this repo: https://github.com/axinc-ai/segment-anything-2/tree/onnx

It's also possible to get a decent speedup (~4x, possibly bigger than using tensorRT on it's own) by reducing the resolution to 512px (see issue #257). Obviously this reduces the segmentation quality, but not as much as expected (at least imo).

Thanks for your reply.I have read the post and got the onnx file. I will try to convert the onnx to tensorRT.

Aimol-l commented 1 month ago

我实现了导出所有的onnx文件,并且通过c++ onnxruntime 实现video的推理。 I implemented the export of all onnx files and realized the inference of video through c++ onnx runtime。 inference with C++ :https://github.com/Aimol-l/OrtInference export onnx file:https://github.com/Aimol-l/SAM2Export

jackwei86 commented 1 month ago

我实现了导出所有的onnx文件,并且通过c++ onnxruntime 实现video的推理。 I implemented the export of all onnx files and realized the inference of video through c++ onnx runtime。 inference with C++ :https://github.com/Aimol-l/OrtInference export onnx file:https://github.com/Aimol-l/SAM2Export

后续有使用tensorRT进一步优化提升FPS的计划吗?目前我用RTX4060,只能达到10 FPS (sam2_hiera_small.pt image_size:1024)不知道你那边的推理硬件是什么型号,使用 c++ onnxruntime 推理能达到多少FPS。

Aimol-l commented 1 month ago

我实现了导出所有的onnx文件,并且通过c++ onnxruntime 实现video的推理。 I implemented the export of all onnx files and realized the inference of video through c++ onnx runtime。 inference with C++ :https://github.com/Aimol-l/OrtInference export onnx file:https://github.com/Aimol-l/SAM2Export

后续有使用tensorRT进一步优化提升FPS的计划吗?目前我用RTX4060,只能达到10 FPS (sam2_hiera_small.pt image_size:1024)不知道你那边的推理硬件是什么型号,使用 c++ onnxruntime 推理能达到多少FPS。

4070 ti super, cuda推理,也是这个速度。https://www.bilibili.com/video/BV1iV4sesEak/?vd_source=53feb81d41ae94687975addeae931a0a

jackwei86 commented 1 month ago

我实现了导出所有的onnx文件,并且通过c++ onnxruntime 实现video的推理。 I implemented the export of all onnx files and realized the inference of video through c++ onnx runtime。 inference with C++ :https://github.com/Aimol-l/OrtInference export onnx file:https://github.com/Aimol-l/SAM2Export

后续有使用tensorRT进一步优化提升FPS的计划吗?目前我用RTX4060,只能达到10 FPS (sam2_hiera_small.pt image_size:1024)不知道你那边的推理硬件是什么型号,使用 c++ onnxruntime 推理能达到多少FPS。

4070 ti super, cuda推理,也是这个速度。https://www.bilibili.com/video/BV1iV4sesEak/?vd_source=53feb81d41ae94687975addeae931a0a

谢谢,B站已关注。我看你视频中有很详细的 Sam2用到的transformer模型 encode decode attention的介绍,正是我需要的。