Is it possible to use Nvidia TensorRT to accelerate SAM2 inference with C++ implement？

jackwei86 commented 2 months ago

I`m not quite familiar with the Transformer model. There are more steps to do than other model with the Encoder and Decoder. Such as the last encoder block output needs to be as the input for the next encoder block.etc. Does anyone has an example of using C++ with Nvidia TensorRT?

To handle Run Segment Anything Model 2 on a live video stream ,i have follow the link segment-anything-2-real-time. I need to implement the functionality using C++ with Nvidia TensorRT for low latency.

Henistein commented 2 months ago

I am looking for the same! I need to run SAM2 on a live video stream but I am getting just ~1.5 fps using a nvidia T4.

free-soellingeraj commented 2 months ago

There is https://github.com/NVIDIA-AI-IOT/nanosam and I am wondering what the differences between creation of encoder/decoder .engines would be for sam2 vs sam1

jackwei86 commented 2 months ago

There is https://github.com/NVIDIA-AI-IOT/nanosam and I am wondering what the differences between creation of encoder/decoder .engines would be for sam2 vs sam1

sam1 can only be used for images while sam2 support images and videos. I found some informations from Nvidia forum https://forums.developer.nvidia.com/t/unable-to-install-sam2-on-orin-nano/302009/19. NVIDIA maybe optimize SAM2 through TAO / DeepStream in the future. And I also found this FasterTransformer . but the SAM2 model is not included .

Thanks for your reply. I really appreciated it.

jackwei86 commented 2 months ago

I am looking for the same! I need to run SAM2 on a live video stream but I am getting just ~1.5 fps using a nvidia T4.

The default/unoptimized performance is slow as sam2 is a large model.Hoping for Nvidia will optimize the mode soon. u can follow the link: https://forums.developer.nvidia.com/t/unable-to-install-sam2-on-orin-nano/302009/17

heyoeyo commented 2 months ago

From what I understand, it's possible to convert from onnx to tensorRT. So it might be worth checking out the onnx variants of SAM (see discussion in issue #186), such as this repo: https://github.com/axinc-ai/segment-anything-2/tree/onnx

It's also possible to get a decent speedup (~4x, possibly bigger than using tensorRT on it's own) by reducing the resolution to 512px (see issue #257). Obviously this reduces the segmentation quality, but not as much as expected (at least imo).

jackwei86 commented 2 months ago

From what I understand, it's possible to convert from onnx to tensorRT. So it might be worth checking out the onnx variants of SAM (see discussion in issue #186), such as this repo: https://github.com/axinc-ai/segment-anything-2/tree/onnx

It's also possible to get a decent speedup (~4x, possibly bigger than using tensorRT on it's own) by reducing the resolution to 512px (see issue #257). Obviously this reduces the segmentation quality, but not as much as expected (at least imo).

Thanks for your reply.I have read the post and got the onnx file. I will try to convert the onnx to tensorRT.

Aimol-l commented 2 months ago

我实现了导出所有的onnx文件，并且通过c++ onnxruntime 实现video的推理。 I implemented the export of all onnx files and realized the inference of video through c++ onnx runtime。 inference with C++ ：https://github.com/Aimol-l/OrtInference export onnx file：https://github.com/Aimol-l/SAM2Export

jackwei86 commented 2 months ago

我实现了导出所有的onnx文件，并且通过c++ onnxruntime 实现video的推理。 I implemented the export of all onnx files and realized the inference of video through c++ onnx runtime。 inference with C++ ：https://github.com/Aimol-l/OrtInference export onnx file：https://github.com/Aimol-l/SAM2Export

后续有使用tensorRT进一步优化提升FPS的计划吗？目前我用RTX4060，只能达到10 FPS （sam2_hiera_small.pt image_size:1024）不知道你那边的推理硬件是什么型号，使用 c++ onnxruntime 推理能达到多少FPS。

Aimol-l commented 2 months ago

我实现了导出所有的onnx文件，并且通过c++ onnxruntime 实现video的推理。 I implemented the export of all onnx files and realized the inference of video through c++ onnx runtime。 inference with C++ ：https://github.com/Aimol-l/OrtInference export onnx file：https://github.com/Aimol-l/SAM2Export

后续有使用tensorRT进一步优化提升FPS的计划吗？目前我用RTX4060，只能达到10 FPS （sam2_hiera_small.pt image_size:1024）不知道你那边的推理硬件是什么型号，使用 c++ onnxruntime 推理能达到多少FPS。

4070 ti super, cuda推理，也是这个速度。https://www.bilibili.com/video/BV1iV4sesEak/?vd_source=53feb81d41ae94687975addeae931a0a

jackwei86 commented 2 months ago

我实现了导出所有的onnx文件，并且通过c++ onnxruntime 实现video的推理。 I implemented the export of all onnx files and realized the inference of video through c++ onnx runtime。 inference with C++ ：https://github.com/Aimol-l/OrtInference export onnx file：https://github.com/Aimol-l/SAM2Export

后续有使用tensorRT进一步优化提升FPS的计划吗？目前我用RTX4060，只能达到10 FPS （sam2_hiera_small.pt image_size:1024）不知道你那边的推理硬件是什么型号，使用 c++ onnxruntime 推理能达到多少FPS。

4070 ti super, cuda推理，也是这个速度。https://www.bilibili.com/video/BV1iV4sesEak/?vd_source=53feb81d41ae94687975addeae931a0a

谢谢，B站已关注。我看你视频中有很详细的 Sam2用到的transformer模型 encode decode attention的介绍，正是我需要的。

okideal commented 17 hours ago

我实现了导出所有的onnx文件，并且通过c++ onnxruntime 实现video的推理。 I implemented the export of all onnx files and realized the inference of video through c++ onnx runtime。 inference with C++ ：https://github.com/Aimol-l/OrtInference export onnx file：https://github.com/Aimol-l/SAM2Export

请问大佬有尝试tensorrt方案么，我用您的onnx模型直接--fp16转换image_encoder会有精度不对齐的问题

Aimol-l commented 17 hours ago

我这里使用TRT速度很慢。

这部分代码就是使用TRT的。float16的问题我不清楚，没有使用过。

su_aimol @.***

------------------ 原始邮件 ------------------ 发件人: "Ivys @.>; 发送时间: 2024年11月18日(星期一) 上午10:02 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [facebookresearch/sam2] Is it possible to use Nvidia TensorRT to accelerate SAM2 inference with C++ implement？ (Issue #284)

我实现了导出所有的onnx文件，并且通过c++ onnxruntime 实现video的推理。 I implemented the export of all onnx files and realized the inference of video through c++ onnx runtime。 inference with C++ ：https://github.com/Aimol-l/OrtInference export onnx file：https://github.com/Aimol-l/SAM2Export

请问大佬有尝试tensorrt方案么，我用您的onnx模型直接--fp16转换image_encoder会有精度不对齐的问题

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

okideal commented 13 hours ago

@Aimol-l 我这还挺快的，4090上image_encoder+mask_decoder 10+ms. 但是遇到个问题就是low_mask出来后resize成原图大小，感觉碎片很多，我之前用nvidia的nanosam都没有这种现象，大佬有遇到过么? 20241118-145957

Aimol-l commented 12 hours ago

@Aimol-l 我这还挺快的，4090上image_encoder+mask_decoder 10+ms. 但是遇到个问题就是low_mask出来后resize成原图大小，感觉碎片很多，我之前用nvidia的nanosam都没有这种现象，大佬有遇到过么?

我测试的数据很少，但是没有出现这个问题，你可以直接查看一下low_mask的概率图，判断一下是不是二值化的时候阈值不太对。

ryouchinsa commented 1 hour ago

sam-cpp-macos is the Segment Anything Model 2 CPP Wrapper for macOS and Ubuntu CPU/GPU. This code is to run a SAM2 ONNX model in c++ code and implemented on the macOS app RectLabel. This code is currently support only image prediction, not video prediction. We hope this code would be helpful for some users.

On macOS CPU use-case.

SAM2 Tiny takes 1s for preprocessing.
SAM2 Small takes 2s for preprocessing.
SAM2 BasePlus takes 4s for preprocessing.
SAM2 Large takes 10s for preprocessing.

facebookresearch / sam2

Is it possible to use Nvidia TensorRT to accelerate SAM2 inference with C++ implement？ #284