Closed HuYanchen-hub closed 1 year ago
Hi @HuYanchen-hub, thanks for your interest.
Do you set the task=instance
when running the evaluation script? The numbers you shared seem to correspond to task=panoptic
in the evaluation script. We mention the same in the instructions here.
I ran the evaluation on an A100 myself now and obtained the following results for the DiNAT-L backbone:
#### DiNAT-L Oneformer
[07/05 05:47:28 d2.evaluation.testing]: copypaste: Task: segm
[07/05 05:47:28 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APs,APm,APl
[07/05 05:47:28 d2.evaluation.testing]: copypaste: 49.2071,73.8117,53.6113,29.4197,53.7316,70.9744
You might experience a variance of 0.1-0.2 units when running evaluations on different machines (I remember noticing something like that while experimenting).
Thanks for your reply, when I set task=instance
, I got the correct result. But when I use swin-L backbone to test instance semantic tasks with task=semantic
, I get mIoU which is lower than task=panoptic
. I would like to ask if the mIoU results you reported are the higher of them? Or is it due to variance caused by different experimental equipment?
copypaste: Task: sem_seg
[07/05 21:06:35 d2.evaluation.testing]: copypaste: mIoU,fwIoU,mACC,pACC
[07/05 21:06:35 d2.evaluation.testing]: copypaste: 67.2288,72.4984,78.5884,82.9312
And when I use DiNAT-L Backbone to test with task = panoptic
, the experimental result of PQ_{st}is also 0.1 different from your result.
Task: panoptic_seg
[07/05 21:14:12 d2.evaluation.testing]: copypaste: PQ,SQ,RQ,PQ_th,SQ_th,RQ_th,PQ_st,SQ_st,RQ_st
[07/05 21:14:12 d2.evaluation.testing]: copypaste: 57.9436,83.7602,68.4097,64.3089,84.9244,75.2713,48.3356,82.0030,58.0525
OneFormer is a very good work, and we want to support this algorithm in the open source object detection toolbox mmdetection, so we need to master more experimental details. Thank you for your help.
Hi @HuYanchen-hub, thanks for working on adding the support of OneFormer to detection!
We report the metric scores corresponding to the metric-focused task for each task. So, we report mIoU with task=semantic.
I believe the difference you notice is within the variance range both for mIoU and PQ_st.
Thanks for you reply!
Hi @HuYanchen-hub, thanks for working on adding the support of OneFormer to detection!
We report the metric scores corresponding to the metric-focused task for each task. So, we report mIoU with
task=semantic.
I believe the difference you notice is within the variance range both for mIoU and PQ_st.
Thank you very much, get it!
When I downloaded the pre-trained model on the coco dataset you provided for inference, I found that the instance segmentation accuracy of the coco data set always differs by 0.2AP. The following is the experimental result.
METHOD | BACKBONE | PQ | PQTH | PQST | AP | MIOU | #PARAMS | CONFIG | CHECKPOINT -- | -- | -- | -- | -- | -- | -- | -- | -- | -- OneFormer | Swin-L† | 57.9 | 64.4 | 48.0 | 49.0 | 67.4 | 219M | [config](https://github.com/SHI-Labs/OneFormer/issues/configs/coco/swin/oneformer_swin_large_bs16_100ep.yaml) | [model](https://shi-labs.com/projects/oneformer/coco/150_16_swin_l_oneformer_coco_100ep.pth) | | 57.9 | 64.4 | 48.0 | **48.8** | 67.4 | | | OneFormer | DiNAT-L† | 58.0 | 64.3 | 48.4 | 49.2 | 68.1 | 223M | [[config](https://github.com/SHI-Labs/OneFormer/blob/main/configs/coco/dinat/oneformer_dinat_large_bs16_100ep.yaml)] | [[model](https://shi-labs.com/projects/oneformer/coco/150_16_dinat_l_oneformer_coco_100ep.pth)] | | | | 58.0 | 64.3 | **48.3** | **49.0** | 68.1 | |The following is my experimental environment.