SHI-Labs / OneFormer

OneFormer: One Transformer to Rule Universal Image Segmentation, arxiv 2022 / CVPR 2023
MIT License
1.39k stars 129 forks source link

Inference time #105

Open vietpho opened 7 months ago

vietpho commented 7 months ago


First off, I want to thank you for sharing your amazing code with us. However, I noticed in your documentation that you mentioned, "[Note: Inference on CPU may take up to 2 minutes. On a single RTX A6000 GPU, OneFormer can perform inference at more than 15 FPS.]" Additionally, I saw in the issues section that you responded to a question about real-time segmentation, stating that a model with Swin-L as the backbone could achieve this.

But, I'm using an RTX 3090 and have tried running with various models using the checkpoints you provided. Unfortunately, it's taking at least 2 seconds per image. I used R50, and my images are 1280x1280 in size.

Here are the models I tested:

150_16_dinat_l_oneformer_coco_100ep.pth 150_16_swin_l_oneformer_coco_100ep.pth 250_16_swin_l_oneformer_cityscapes_90k.pth 1280x1280_250_16_swin_l_oneformer_ade20k_160k.pth

Could you help me understand why there is such a significant difference in inference time?