Welcome to the TensorRT implementation of the "Segment Anything" model!
This repository contains the implementation of the "Segment Anything" model in TensorRT. While I found existing implementations for vit_b and vit_l, I couldn't find one for vit_h. Therefore, to the best of my knowledge, this is the first implementation available online that covers all three model types.
I'm open to contributions and feedback. If you'd like to contribute or provide feedback, feel free to open an issue or submit a pull request!
This repository comes with a Docker file for easy setup. However, there are some additional requirements:
The models vit_b and vit_l are exported normally without any complications. However, the vit_h model presents a challenge due to its size, weighing 2.6GB, which exceeds the protobuf limit of 2GB. To overcome this limitation, the model had to be split into two separate parts, resulting in the generation of two distinct models which can later be ensemble. This approach allows for the successful conversion while staying within the protobuf size constraint.
git clone https://github.com/ItayElam/SegmentAnything-TensorRT.git
cd SegmentAnything-TensorRT
chmod +x launch.sh
./launch.sh -b # build the image
./launch.sh -r # run the image
Model | Average FPS | Average Time (sec) | Relative FPS | Relative Time (%) |
---|---|---|---|---|
PyTorch model | 9.96 | 0.100417 | 1.0 | 100.0 |
TensorRT model | 15.24 | 0.065603 | 1.53 | 65.33 |
TensorRT FP16 model | 29.32 | 0.034104 | 2.94 | 33.96 |
Model | Average FPS | Average Time (sec) | Relative FPS | Relative Time (%) |
---|---|---|---|---|
PyTorch model | 3.91 | 0.255552 | 1.0 | 100.0 |
TensorRT model | 4.81 | 0.208019 | 1.23 | 81.4 |
TensorRT FP16 model | 11.09 | 0.090139 | 2.84 | 35.27 |
Model | Average FPS | Average Time (sec) | Relative FPS | Relative Time (%) |
---|---|---|---|---|
PyTorch model | 2.22 | 0.45045 | 1.0 | 100.0 |
TensorRT model | 2.37 | 0.421377 | 1.07 | 93.55 |
TensorRT FP16 model | 5.97 | 0.167488 | 2.69 | 37.18 |
Model | Minimum IOU | Mean IOU |
---|---|---|
vit_b FP32 | 0.9986 | 0.9997 |
vit_b FP16 | 0.9931 | 0.9986 |
Model | Minimum IOU | Mean IOU |
---|---|---|
vit_l FP32 | 0.9983 | 0.9996 |
vit_l FP16 | 0.9958 | 0.9987 |
Model | Minimum IOU | Mean IOU |
---|---|---|
vit_h FP32 | 0.9982 | 0.9997 |
vit_h FP16 | 0.9911 | 0.9983 |
python main.py export --model_path pth_model/sam_vit_b_01ec64.pth --model_precision fp32
python main.py export --model_path pth_model/sam_vit_b_01ec64.pth --model_precision fp16
python main.py export --model_path pth_model/sam_vit_b_01ec64.pth --model_precision both
# Repeat the above commands for vit_l and vit_h models
python main.py benchmark --sam_checkpoint pth_model/sam_vit_b_01ec64.pth --model_type vit_b --warmup_iters 5 --measure_iters 50
# Repeat the above command for vit_l and vit_h models
# --warmup_iters and --measure_iters are optional and default to 5 and 50 respectively
python main.py accuracy --image_dir test_images --model_type vit_b --sam_checkpoint pth_model/sam_vit_b_01ec64.pth
# Repeat the above command for vit_l and vit_h models
when running inference, the image you provided will open up for you to choose the point. after you're satisfied with the location, press enters to run inference.
# vit_b and vit_l
python main.py infer --pth_path pth_model/sam_vit_b_01ec64.pth --model_1 exported_models/vit_b/model_fp32.engine --img_path images/original_image.jpg
# vit_h
python main.py infer --pth_path pth_model/sam_vit_h_4b8939.pth --model_1 exported_models/vit_h/model_fp32_1.engine --model_2 exported_models/vit_h/model_fp32_2.engine --img_path images/original_image.jpg
Feel free to modify the command arguments according to your setup and requirements. For additional usage scenarios, you can refer to tests.py.
Should you have any questions or wish to contribute, please feel free to open an issue or create a pull request.