First off, I want to thank you for sharing your amazing code with us. However, I noticed in your documentation that you mentioned, "[Note: Inference on CPU may take up to 2 minutes. On a single RTX A6000 GPU, OneFormer can perform inference at more than 15 FPS.]" Additionally, I saw in the issues section that you responded to a question about real-time segmentation, stating that a model with Swin-L as the backbone could achieve this.
But, I'm using an RTX 3090 and have tried running demo.py with various models using the checkpoints you provided. Unfortunately, it's taking at least 2 seconds per image. I used R50, and my images are 1280x1280 in size.
Hello,
First off, I want to thank you for sharing your amazing code with us. However, I noticed in your documentation that you mentioned, "[Note: Inference on CPU may take up to 2 minutes. On a single RTX A6000 GPU, OneFormer can perform inference at more than 15 FPS.]" Additionally, I saw in the issues section that you responded to a question about real-time segmentation, stating that a model with Swin-L as the backbone could achieve this.
But, I'm using an RTX 3090 and have tried running demo.py with various models using the checkpoints you provided. Unfortunately, it's taking at least 2 seconds per image. I used R50, and my images are 1280x1280 in size.
Here are the models I tested:
150_16_dinat_l_oneformer_coco_100ep.pth 150_16_swin_l_oneformer_coco_100ep.pth 250_16_swin_l_oneformer_cityscapes_90k.pth 1280x1280_250_16_swin_l_oneformer_ade20k_160k.pth
Could you help me understand why there is such a significant difference in inference time?