Open yaohusama opened 1 year ago
I encounter the same problem.
Hi, the label 0 not represents to the background, due to the dataloader has made "reduce_zero_label=True". I double-check the inference code on my own machine, and it seems like the results is correct:
I'm not sure this may be due to data processing or the environment. I will try to find out why.
I had the same problem. I think the following line is problematic if the label 0 in gt_semantic_seg represents the background class
So I added following lines before the above line:
gt_semantic_seg[gt_semantic_seg == 0] = 255 gt_semantic_seg = gt_semantic_seg - 1 gt_semantic_seg[gt_semantic_seg == 254] = 255
Then the performance got improved but not as good as the reported ones, especially for unseen classes, with inductive training for 20K iterations: +++++++++++ Total classes +++++++++++++ per class results: +-------------+-------+-------+ | Class | IoU | Acc | +-------------+-------+-------+ | aeroplane | 82.16 | 96.41 | | bicycle | 98.42 | 98.78 | | bird | 87.65 | 89.47 | | boat | 88.1 | 90.22 | | bottle | 96.01 | 98.25 | | bus | 86.64 | 87.77 | | car | 96.58 | 98.58 | | cat | 34.34 | 87.5 | | chair | 91.81 | 96.58 | | cow | 66.64 | 68.45 | | diningtable | 95.54 | 97.53 | | dog | 94.8 | 95.97 | | horse | 90.6 | 97.06 | | motorbike | 95.28 | 97.43 | | person | 76.74 | 79.97 | | pottedplant | 11.68 | 16.36 | | sheep | 3.61 | 10.32 | | sofa | 0.0 | 0.0 | | train | 0.22 | 0.25 | | tvmonitor | 0.0 | nan | +-------------+-------+-------+ Summary: +-------+-------+-------+ | aAcc | mIoU | mAcc | +-------+-------+-------+ | 78.61 | 64.84 | 74.05 | +-------+-------+-------+
+++++++++++ Seen classes +++++++++++++ seen per class results: +-------------+-------+-------+ | Class | IoU | Acc | +-------------+-------+-------+ | aeroplane | 82.16 | 96.41 | | bicycle | 98.42 | 98.78 | | bird | 87.65 | 89.47 | | boat | 88.1 | 90.22 | | bottle | 96.01 | 98.25 | | bus | 86.64 | 87.77 | | car | 96.58 | 98.58 | | cat | 34.34 | 87.5 | | chair | 91.81 | 96.58 | | cow | 66.64 | 68.45 | | diningtable | 95.54 | 97.53 | | dog | 94.8 | 95.97 | | horse | 90.6 | 97.06 | | motorbike | 95.28 | 97.43 | | person | 76.74 | 79.97 | +-------------+-------+-------+ Seen Summary: +-------+-------+------+ | aAcc | mIoU | mAcc | +-------+-------+------+ | 78.61 | 85.42 | 92.0 | +-------+-------+------+
+++++++++++ Unseen classes +++++++++++++ unseen per class results: +-------------+-------+-------+ | Class | IoU | Acc | +-------------+-------+-------+ | pottedplant | 11.68 | 16.36 | | sheep | 3.61 | 10.32 | | sofa | 0.0 | 0.0 | | train | 0.22 | 0.25 | | tvmonitor | 0.0 | nan | +-------------+-------+-------+ Unseen Summary: +-------+------+------+ | aAcc | mIoU | mAcc | +-------+------+------+ | 78.61 | 3.1 | 6.73 | +-------+------+------+
When I tried to train the model following the readme, the losses almost remained the same. Did you meet this problem before? Thank you!
I get test result much better than above, but it is still lower than the data in the paper. Besides, I find that testing after training for 1.8K iterations on my machine shows better performance than 2K. Maybe the reason is the difference of environment?
Here is the inference results after training for 1.8K iterations:
+++++++++++ Total classes +++++++++++++
per class results:
+-------------+-------+-------+
| Class | IoU | Acc |
+-------------+-------+-------+
| aeroplane | 97.34 | 97.72 |
| bicycle | 87.74 | 97.87 |
| bird | 98.56 | 98.8 |
| boat | 93.1 | 96.97 |
| bottle | 94.74 | 96.7 |
| bus | 97.55 | 98.18 |
| car | 95.36 | 98.38 |
| cat | 96.47 | 96.84 |
| chair | 53.7 | 59.87 |
| cow | 92.87 | 93.05 |
| diningtable | 84.8 | 89.13 |
| dog | 94.65 | 97.09 |
| horse | 95.21 | 97.07 |
| motorbike | 94.39 | 97.7 |
| person | 96.18 | 97.85 |
| pottedplant | 48.31 | 53.76 |
| sheep | 89.78 | 99.74 |
| sofa | 56.97 | 97.2 |
| train | 97.75 | 99.9 |
| tvmonitor | 29.58 | 32.23 |
+-------------+-------+-------+
Summary:
+------+-------+------+
| aAcc | mIoU | mAcc |
+------+-------+------+
| 93.4 | 84.75 | 89.8 |
+------+-------+------+
+++++++++++ Seen classes +++++++++++++
seen per class results:
+-------------+-------+-------+
| Class | IoU | Acc |
+-------------+-------+-------+
| aeroplane | 97.34 | 97.72 |
| bicycle | 87.74 | 97.87 |
| bird | 98.56 | 98.8 |
| boat | 93.1 | 96.97 |
| bottle | 94.74 | 96.7 |
| bus | 97.55 | 98.18 |
| car | 95.36 | 98.38 |
| cat | 96.47 | 96.84 |
| chair | 53.7 | 59.87 |
| cow | 92.87 | 93.05 |
| diningtable | 84.8 | 89.13 |
| dog | 94.65 | 97.09 |
| horse | 95.21 | 97.07 |
| motorbike | 94.39 | 97.7 |
| person | 96.18 | 97.85 |
+-------------+-------+-------+
Seen Summary:
+------+-------+-------+
| aAcc | mIoU | mAcc |
+------+-------+-------+
| 93.4 | 91.51 | 94.22 |
+------+-------+-------+
+++++++++++ Unseen classes +++++++++++++
unseen per class results:
+-------------+-------+-------+
| Class | IoU | Acc |
+-------------+-------+-------+
| pottedplant | 48.31 | 53.76 |
| sheep | 89.78 | 99.74 |
| sofa | 56.97 | 97.2 |
| train | 97.75 | 99.9 |
| tvmonitor | 29.58 | 32.23 |
+-------------+-------+-------+
Unseen Summary:
+------+-------+-------+
| aAcc | mIoU | mAcc |
+------+-------+-------+
| 93.4 | 64.48 | 76.57 |
+------+-------+-------+
And here is the inference results after training for 2K iterations:
+++++++++++ Total classes +++++++++++++
per class results:
+-------------+-------+-------+
| Class | IoU | Acc |
+-------------+-------+-------+
| aeroplane | 98.16 | 98.52 |
| bicycle | 89.11 | 97.76 |
| bird | 98.48 | 98.73 |
| boat | 93.34 | 96.8 |
| bottle | 94.44 | 96.18 |
| bus | 98.0 | 98.71 |
| car | 95.12 | 97.59 |
| cat | 96.55 | 96.8 |
| chair | 51.67 | 56.8 |
| cow | 93.61 | 93.81 |
| diningtable | 84.6 | 89.38 |
| dog | 95.02 | 97.55 |
| horse | 96.1 | 96.98 |
| motorbike | 94.41 | 97.77 |
| person | 96.1 | 97.84 |
| pottedplant | 40.15 | 44.11 |
| sheep | 88.94 | 99.75 |
| sofa | 54.46 | 97.26 |
| train | 95.52 | 99.76 |
| tvmonitor | 18.38 | 20.65 |
+-------------+-------+-------+
Summary:
+-------+-------+-------+
| aAcc | mIoU | mAcc |
+-------+-------+-------+
| 92.86 | 83.61 | 88.64 |
+-------+-------+-------+
+++++++++++ Seen classes +++++++++++++
seen per class results:
+-------------+-------+-------+
| Class | IoU | Acc |
+-------------+-------+-------+
| aeroplane | 98.16 | 98.52 |
| bicycle | 89.11 | 97.76 |
| bird | 98.48 | 98.73 |
| boat | 93.34 | 96.8 |
| bottle | 94.44 | 96.18 |
| bus | 98.0 | 98.71 |
| car | 95.12 | 97.59 |
| cat | 96.55 | 96.8 |
| chair | 51.67 | 56.8 |
| cow | 93.61 | 93.81 |
| diningtable | 84.6 | 89.38 |
| dog | 95.02 | 97.55 |
| horse | 96.1 | 96.98 |
| motorbike | 94.41 | 97.77 |
| person | 96.1 | 97.84 |
+-------------+-------+-------+
Seen Summary:
+-------+-------+-------+
| aAcc | mIoU | mAcc |
+-------+-------+-------+
| 92.86 | 91.65 | 94.08 |
+-------+-------+-------+
+++++++++++ Unseen classes +++++++++++++
unseen per class results:
+-------------+-------+-------+
| Class | IoU | Acc |
+-------------+-------+-------+
| pottedplant | 40.15 | 44.11 |
| sheep | 88.94 | 99.75 |
| sofa | 54.46 | 97.26 |
| train | 95.52 | 99.76 |
| tvmonitor | 18.38 | 20.65 |
+-------------+-------+-------+
Unseen Summary:
+-------+-------+-------+
| aAcc | mIoU | mAcc |
+-------+-------+-------+
| 92.86 | 59.49 | 72.31 |
+-------+-------+-------+
I am getting poor results on COCO with just inferencing the models shared in the repository:
I want to point out that these are evaluated on the Panoptic COCO dataset and not COCOstuff, is it possible these results are actually fine for a slightly different distribution? Or is something else wrong?
Seen Summary:
+-------+-------+-------+
| aAcc | mIoU | mAcc |
+-------+-------+-------+
| 28.39 | 29.62 | 44.45 |
+-------+-------+-------+
+++++++++++ Unseen classes +++++++++++++
unseen per class results:
+-------+-----+-----+
| Class | IoU | Acc |
+-------+-----+-----+
+-------+-----+-----+
Unseen Summary:
+-------+------+------+
| aAcc | mIoU | mAcc |
+-------+------+------+
| 28.39 | nan | nan |
command is:
CHECKPOINT=weights/coco_fully_512_vit_base.pth
CONFIG_FILE=vpt_seg_fully_vit-b_512x512_80k_12_100_multi.py
python eval_zegclip_coco.py ${CONFIG_FILE} ${CHECKPOINT} --eval=mIoU --out $OUT_FILE
Excuse me, after training 20,000 generations on the voc enhanced dataset, the indicators on the visible and invisible classes in the inference stage are close to 0. What is the reason for this phenomenon?