Open jackwei86 opened 2 months ago
I am looking for the same! I need to run SAM2 on a live video stream but I am getting just ~1.5 fps using a nvidia T4.
There is https://github.com/NVIDIA-AI-IOT/nanosam and I am wondering what the differences between creation of encoder/decoder .engines would be for sam2 vs sam1
There is https://github.com/NVIDIA-AI-IOT/nanosam and I am wondering what the differences between creation of encoder/decoder .engines would be for sam2 vs sam1
sam1 can only be used for images while sam2 support images and videos. I found some informations from Nvidia forum https://forums.developer.nvidia.com/t/unable-to-install-sam2-on-orin-nano/302009/19. NVIDIA maybe optimize SAM2 through TAO / DeepStream in the future. And I also found this FasterTransformer . but the SAM2 model is not included .
Thanks for your reply. I really appreciated it.
I am looking for the same! I need to run SAM2 on a live video stream but I am getting just ~1.5 fps using a nvidia T4.
The default/unoptimized performance is slow as sam2 is a large model.Hoping for Nvidia will optimize the mode soon. u can follow the link: https://forums.developer.nvidia.com/t/unable-to-install-sam2-on-orin-nano/302009/17
From what I understand, it's possible to convert from onnx to tensorRT. So it might be worth checking out the onnx variants of SAM (see discussion in issue #186), such as this repo: https://github.com/axinc-ai/segment-anything-2/tree/onnx
It's also possible to get a decent speedup (~4x, possibly bigger than using tensorRT on it's own) by reducing the resolution to 512px (see issue #257). Obviously this reduces the segmentation quality, but not as much as expected (at least imo).
From what I understand, it's possible to convert from onnx to tensorRT. So it might be worth checking out the onnx variants of SAM (see discussion in issue #186), such as this repo: https://github.com/axinc-ai/segment-anything-2/tree/onnx
It's also possible to get a decent speedup (~4x, possibly bigger than using tensorRT on it's own) by reducing the resolution to 512px (see issue #257). Obviously this reduces the segmentation quality, but not as much as expected (at least imo).
Thanks for your reply.I have read the post and got the onnx file. I will try to convert the onnx to tensorRT.
我实现了导出所有的onnx文件,并且通过c++ onnxruntime 实现video的推理。 I implemented the export of all onnx files and realized the inference of video through c++ onnx runtime。 inference with C++ :https://github.com/Aimol-l/OrtInference export onnx file:https://github.com/Aimol-l/SAM2Export
我实现了导出所有的onnx文件,并且通过c++ onnxruntime 实现video的推理。 I implemented the export of all onnx files and realized the inference of video through c++ onnx runtime。 inference with C++ :https://github.com/Aimol-l/OrtInference export onnx file:https://github.com/Aimol-l/SAM2Export
后续有使用tensorRT进一步优化提升FPS的计划吗?目前我用RTX4060,只能达到10 FPS (sam2_hiera_small.pt image_size:1024)不知道你那边的推理硬件是什么型号,使用 c++ onnxruntime 推理能达到多少FPS。
我实现了导出所有的onnx文件,并且通过c++ onnxruntime 实现video的推理。 I implemented the export of all onnx files and realized the inference of video through c++ onnx runtime。 inference with C++ :https://github.com/Aimol-l/OrtInference export onnx file:https://github.com/Aimol-l/SAM2Export
后续有使用tensorRT进一步优化提升FPS的计划吗?目前我用RTX4060,只能达到10 FPS (sam2_hiera_small.pt image_size:1024)不知道你那边的推理硬件是什么型号,使用 c++ onnxruntime 推理能达到多少FPS。
4070 ti super, cuda推理,也是这个速度。https://www.bilibili.com/video/BV1iV4sesEak/?vd_source=53feb81d41ae94687975addeae931a0a
我实现了导出所有的onnx文件,并且通过c++ onnxruntime 实现video的推理。 I implemented the export of all onnx files and realized the inference of video through c++ onnx runtime。 inference with C++ :https://github.com/Aimol-l/OrtInference export onnx file:https://github.com/Aimol-l/SAM2Export
后续有使用tensorRT进一步优化提升FPS的计划吗?目前我用RTX4060,只能达到10 FPS (sam2_hiera_small.pt image_size:1024)不知道你那边的推理硬件是什么型号,使用 c++ onnxruntime 推理能达到多少FPS。
4070 ti super, cuda推理,也是这个速度。https://www.bilibili.com/video/BV1iV4sesEak/?vd_source=53feb81d41ae94687975addeae931a0a
谢谢,B站已关注。我看你视频中有很详细的 Sam2用到的transformer模型 encode decode attention的介绍,正是我需要的。
我实现了导出所有的onnx文件,并且通过c++ onnxruntime 实现video的推理。 I implemented the export of all onnx files and realized the inference of video through c++ onnx runtime。 inference with C++ :https://github.com/Aimol-l/OrtInference export onnx file:https://github.com/Aimol-l/SAM2Export
请问大佬有尝试tensorrt方案么,我用您的onnx模型直接--fp16转换image_encoder会有精度不对齐的问题
我这里使用TRT速度很慢。
这部分代码就是使用TRT的。float16的问题我不清楚,没有使用过。
su_aimol @.***
------------------ 原始邮件 ------------------ 发件人: "Ivys @.>; 发送时间: 2024年11月18日(星期一) 上午10:02 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [facebookresearch/sam2] Is it possible to use Nvidia TensorRT to accelerate SAM2 inference with C++ implement? (Issue #284)
我实现了导出所有的onnx文件,并且通过c++ onnxruntime 实现video的推理。 I implemented the export of all onnx files and realized the inference of video through c++ onnx runtime。 inference with C++ :https://github.com/Aimol-l/OrtInference export onnx file:https://github.com/Aimol-l/SAM2Export
请问大佬有尝试tensorrt方案么,我用您的onnx模型直接--fp16转换image_encoder会有精度不对齐的问题
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
@Aimol-l 我这还挺快的,4090上image_encoder+mask_decoder 10+ms. 但是遇到个问题就是low_mask出来后resize成原图大小,感觉碎片很多,我之前用nvidia的nanosam都没有这种现象,大佬有遇到过么?
@Aimol-l 我这还挺快的,4090上image_encoder+mask_decoder 10+ms. 但是遇到个问题就是low_mask出来后resize成原图大小,感觉碎片很多,我之前用nvidia的nanosam都没有这种现象,大佬有遇到过么?
我测试的数据很少,但是没有出现这个问题,你可以直接查看一下low_mask的概率图,判断一下是不是二值化的时候阈值不太对。
sam-cpp-macos is the Segment Anything Model 2 CPP Wrapper for macOS and Ubuntu CPU/GPU. This code is to run a SAM2 ONNX model in c++ code and implemented on the macOS app RectLabel. This code is currently support only image prediction, not video prediction. We hope this code would be helpful for some users.
On macOS CPU use-case.
I`m not quite familiar with the Transformer model. There are more steps to do than other model with the Encoder and Decoder. Such as the last encoder block output needs to be as the input for the next encoder block.etc. Does anyone has an example of using C++ with Nvidia TensorRT?
To handle Run Segment Anything Model 2 on a live video stream ,i have follow the link segment-anything-2-real-time. I need to implement the functionality using C++ with Nvidia TensorRT for low latency.