ZiqinZhou66 / ZegCLIP

Official implement of CVPR2023 ZegCLIP: Towards Adapting CLIP for Zero-shot Semantic Segmentation
MIT License
195 stars 17 forks source link

The result of the inference stage is wrong #2

Open yaohusama opened 1 year ago

yaohusama commented 1 year ago

Excuse me, after training 20,000 generations on the voc enhanced dataset, the indicators on the visible and invisible classes in the inference stage are close to 0. What is the reason for this phenomenon?

+++++++++++ Total classes +++++++++++++

per class results: +-------------+------+------+ | Class | IoU | Acc | +-------------+------+------+ | aeroplane | 0.0 | 0.0 | | bicycle | 0.0 | 0.0 | | bird | 0.0 | 0.0 | | boat | 0.0 | 0.0 | | bottle | 0.0 | 0.0 | | bus | 0.24 | 0.52 | | car | 0.0 | 0.0 | | cat | 0.03 | 0.11 | | chair | 0.0 | 0.0 | | cow | 0.0 | 0.0 | | diningtable | 0.01 | 0.01 | | dog | 0.0 | 0.0 | | horse | 0.0 | 0.0 | | motorbike | 0.34 | 0.41 | | person | 0.01 | 0.12 | | pottedplant | 0.0 | 0.0 | | sheep | 0.0 | 0.0 | | sofa | 0.01 | 0.02 | | train | 0.02 | 0.06 | | tvmonitor | 0.0 | nan | +-------------+------+------+ Summary: +------+------+------+ | aAcc | mIoU | mAcc | +------+------+------+ | 0.12 | 0.03 | 0.07 | +------+------+------+

+++++++++++ Seen classes +++++++++++++ seen per class results: +-------------+------+------+ | Class | IoU | Acc | +-------------+------+------+ | aeroplane | 0.0 | 0.0 | | bicycle | 0.0 | 0.0 | | bird | 0.0 | 0.0 | | boat | 0.0 | 0.0 | | bottle | 0.0 | 0.0 | | bus | 0.24 | 0.52 | | car | 0.0 | 0.0 | | cat | 0.03 | 0.11 | | chair | 0.0 | 0.0 | | cow | 0.0 | 0.0 | | diningtable | 0.01 | 0.01 | | dog | 0.0 | 0.0 | | horse | 0.0 | 0.0 | | motorbike | 0.34 | 0.41 | | person | 0.01 | 0.12 | +-------------+------+------+ Seen Summary: +------+------+------+ | aAcc | mIoU | mAcc | +------+------+------+ | 0.12 | 0.04 | 0.08 | +------+------+------+

+++++++++++ Unseen classes +++++++++++++ unseen per class results: +-------------+------+------+ | Class | IoU | Acc | +-------------+------+------+ | pottedplant | 0.0 | 0.0 | | sheep | 0.0 | 0.0 | | sofa | 0.01 | 0.02 | | train | 0.02 | 0.06 | | tvmonitor | 0.0 | nan | +-------------+------+------+ Unseen Summary: +------+------+------+ | aAcc | mIoU | mAcc | +------+------+------+ | 0.12 | 0.01 | 0.02 | +------+------+------+

Harry-zzh commented 1 year ago

I encounter the same problem.

hwanyu112 commented 1 year ago

Hi, the label 0 not represents to the background, due to the dataloader has made "reduce_zero_label=True". I double-check the inference code on my own machine, and it seems like the results is correct: image

I'm not sure this may be due to data processing or the environment. I will try to find out why.

I had the same problem. I think the following line is problematic if the label 0 in gt_semantic_seg represents the background class

https://github.com/ZiqinZhou66/ZegCLIP/blob/a45454fc60f538d5b7b233eff8707c6f9ecad941/models/segmentor/zegclip.py#L142

So I added following lines before the above line:

gt_semantic_seg[gt_semantic_seg == 0] = 255
gt_semantic_seg = gt_semantic_seg - 1
gt_semantic_seg[gt_semantic_seg == 254] = 255

Then the performance got improved but not as good as the reported ones, especially for unseen classes, with inductive training for 20K iterations: +++++++++++ Total classes +++++++++++++ per class results: +-------------+-------+-------+ | Class | IoU | Acc | +-------------+-------+-------+ | aeroplane | 82.16 | 96.41 | | bicycle | 98.42 | 98.78 | | bird | 87.65 | 89.47 | | boat | 88.1 | 90.22 | | bottle | 96.01 | 98.25 | | bus | 86.64 | 87.77 | | car | 96.58 | 98.58 | | cat | 34.34 | 87.5 | | chair | 91.81 | 96.58 | | cow | 66.64 | 68.45 | | diningtable | 95.54 | 97.53 | | dog | 94.8 | 95.97 | | horse | 90.6 | 97.06 | | motorbike | 95.28 | 97.43 | | person | 76.74 | 79.97 | | pottedplant | 11.68 | 16.36 | | sheep | 3.61 | 10.32 | | sofa | 0.0 | 0.0 | | train | 0.22 | 0.25 | | tvmonitor | 0.0 | nan | +-------------+-------+-------+ Summary: +-------+-------+-------+ | aAcc | mIoU | mAcc | +-------+-------+-------+ | 78.61 | 64.84 | 74.05 | +-------+-------+-------+

+++++++++++ Seen classes +++++++++++++ seen per class results: +-------------+-------+-------+ | Class | IoU | Acc | +-------------+-------+-------+ | aeroplane | 82.16 | 96.41 | | bicycle | 98.42 | 98.78 | | bird | 87.65 | 89.47 | | boat | 88.1 | 90.22 | | bottle | 96.01 | 98.25 | | bus | 86.64 | 87.77 | | car | 96.58 | 98.58 | | cat | 34.34 | 87.5 | | chair | 91.81 | 96.58 | | cow | 66.64 | 68.45 | | diningtable | 95.54 | 97.53 | | dog | 94.8 | 95.97 | | horse | 90.6 | 97.06 | | motorbike | 95.28 | 97.43 | | person | 76.74 | 79.97 | +-------------+-------+-------+ Seen Summary: +-------+-------+------+ | aAcc | mIoU | mAcc | +-------+-------+------+ | 78.61 | 85.42 | 92.0 | +-------+-------+------+

+++++++++++ Unseen classes +++++++++++++ unseen per class results: +-------------+-------+-------+ | Class | IoU | Acc | +-------------+-------+-------+ | pottedplant | 11.68 | 16.36 | | sheep | 3.61 | 10.32 | | sofa | 0.0 | 0.0 | | train | 0.22 | 0.25 | | tvmonitor | 0.0 | nan | +-------------+-------+-------+ Unseen Summary: +-------+------+------+ | aAcc | mIoU | mAcc | +-------+------+------+ | 78.61 | 3.1 | 6.73 | +-------+------+------+

When I tried to train the model following the readme, the losses almost remained the same. Did you meet this problem before? Thank you!

hwanyu112 commented 1 year ago

I get test result much better than above, but it is still lower than the data in the paper. Besides, I find that testing after training for 1.8K iterations on my machine shows better performance than 2K. Maybe the reason is the difference of environment?

Here is the inference results after training for 1.8K iterations:

+++++++++++ Total classes +++++++++++++
per class results:
+-------------+-------+-------+
|    Class    |  IoU  |  Acc  |
+-------------+-------+-------+
|  aeroplane  | 97.34 | 97.72 |
|   bicycle   | 87.74 | 97.87 |
|     bird    | 98.56 |  98.8 |
|     boat    |  93.1 | 96.97 |
|    bottle   | 94.74 |  96.7 |
|     bus     | 97.55 | 98.18 |
|     car     | 95.36 | 98.38 |
|     cat     | 96.47 | 96.84 |
|    chair    |  53.7 | 59.87 |
|     cow     | 92.87 | 93.05 |
| diningtable |  84.8 | 89.13 |
|     dog     | 94.65 | 97.09 |
|    horse    | 95.21 | 97.07 |
|  motorbike  | 94.39 |  97.7 |
|    person   | 96.18 | 97.85 |
| pottedplant | 48.31 | 53.76 |
|    sheep    | 89.78 | 99.74 |
|     sofa    | 56.97 |  97.2 |
|    train    | 97.75 |  99.9 |
|  tvmonitor  | 29.58 | 32.23 |
+-------------+-------+-------+
Summary:
+------+-------+------+
| aAcc |  mIoU | mAcc |
+------+-------+------+
| 93.4 | 84.75 | 89.8 |
+------+-------+------+

+++++++++++ Seen classes +++++++++++++
seen per class results:
+-------------+-------+-------+
|    Class    |  IoU  |  Acc  |
+-------------+-------+-------+
|  aeroplane  | 97.34 | 97.72 |
|   bicycle   | 87.74 | 97.87 |
|     bird    | 98.56 |  98.8 |
|     boat    |  93.1 | 96.97 |
|    bottle   | 94.74 |  96.7 |
|     bus     | 97.55 | 98.18 |
|     car     | 95.36 | 98.38 |
|     cat     | 96.47 | 96.84 |
|    chair    |  53.7 | 59.87 |
|     cow     | 92.87 | 93.05 |
| diningtable |  84.8 | 89.13 |
|     dog     | 94.65 | 97.09 |
|    horse    | 95.21 | 97.07 |
|  motorbike  | 94.39 |  97.7 |
|    person   | 96.18 | 97.85 |
+-------------+-------+-------+
Seen Summary:
+------+-------+-------+
| aAcc |  mIoU |  mAcc |
+------+-------+-------+
| 93.4 | 91.51 | 94.22 |
+------+-------+-------+

+++++++++++ Unseen classes +++++++++++++
unseen per class results:
+-------------+-------+-------+
|    Class    |  IoU  |  Acc  |
+-------------+-------+-------+
| pottedplant | 48.31 | 53.76 |
|    sheep    | 89.78 | 99.74 |
|     sofa    | 56.97 |  97.2 |
|    train    | 97.75 |  99.9 |
|  tvmonitor  | 29.58 | 32.23 |
+-------------+-------+-------+
Unseen Summary:
+------+-------+-------+
| aAcc |  mIoU |  mAcc |
+------+-------+-------+
| 93.4 | 64.48 | 76.57 |
+------+-------+-------+

And here is the inference results after training for 2K iterations:

+++++++++++ Total classes +++++++++++++
per class results:
+-------------+-------+-------+
|    Class    |  IoU  |  Acc  |
+-------------+-------+-------+
|  aeroplane  | 98.16 | 98.52 |
|   bicycle   | 89.11 | 97.76 |
|     bird    | 98.48 | 98.73 |
|     boat    | 93.34 |  96.8 |
|    bottle   | 94.44 | 96.18 |
|     bus     |  98.0 | 98.71 |
|     car     | 95.12 | 97.59 |
|     cat     | 96.55 |  96.8 |
|    chair    | 51.67 |  56.8 |
|     cow     | 93.61 | 93.81 |
| diningtable |  84.6 | 89.38 |
|     dog     | 95.02 | 97.55 |
|    horse    |  96.1 | 96.98 |
|  motorbike  | 94.41 | 97.77 |
|    person   |  96.1 | 97.84 |
| pottedplant | 40.15 | 44.11 |
|    sheep    | 88.94 | 99.75 |
|     sofa    | 54.46 | 97.26 |
|    train    | 95.52 | 99.76 |
|  tvmonitor  | 18.38 | 20.65 |
+-------------+-------+-------+
Summary:
+-------+-------+-------+
|  aAcc |  mIoU |  mAcc |
+-------+-------+-------+
| 92.86 | 83.61 | 88.64 |
+-------+-------+-------+

+++++++++++ Seen classes +++++++++++++
seen per class results:
+-------------+-------+-------+
|    Class    |  IoU  |  Acc  |
+-------------+-------+-------+
|  aeroplane  | 98.16 | 98.52 |
|   bicycle   | 89.11 | 97.76 |
|     bird    | 98.48 | 98.73 |
|     boat    | 93.34 |  96.8 |
|    bottle   | 94.44 | 96.18 |
|     bus     |  98.0 | 98.71 |
|     car     | 95.12 | 97.59 |
|     cat     | 96.55 |  96.8 |
|    chair    | 51.67 |  56.8 |
|     cow     | 93.61 | 93.81 |
| diningtable |  84.6 | 89.38 |
|     dog     | 95.02 | 97.55 |
|    horse    |  96.1 | 96.98 |
|  motorbike  | 94.41 | 97.77 |
|    person   |  96.1 | 97.84 |
+-------------+-------+-------+
Seen Summary:
+-------+-------+-------+
|  aAcc |  mIoU |  mAcc |
+-------+-------+-------+
| 92.86 | 91.65 | 94.08 |
+-------+-------+-------+

+++++++++++ Unseen classes +++++++++++++
unseen per class results:
+-------------+-------+-------+
|    Class    |  IoU  |  Acc  |
+-------------+-------+-------+
| pottedplant | 40.15 | 44.11 |
|    sheep    | 88.94 | 99.75 |
|     sofa    | 54.46 | 97.26 |
|    train    | 95.52 | 99.76 |
|  tvmonitor  | 18.38 | 20.65 |
+-------------+-------+-------+
Unseen Summary:
+-------+-------+-------+
|  aAcc |  mIoU |  mAcc |
+-------+-------+-------+
| 92.86 | 59.49 | 72.31 |
+-------+-------+-------+
Maddy12 commented 1 year ago

I am getting poor results on COCO with just inferencing the models shared in the repository:

I want to point out that these are evaluated on the Panoptic COCO dataset and not COCOstuff, is it possible these results are actually fine for a slightly different distribution? Or is something else wrong?

Seen Summary:
+-------+-------+-------+
|  aAcc |  mIoU |  mAcc |
+-------+-------+-------+
| 28.39 | 29.62 | 44.45 |
+-------+-------+-------+

+++++++++++ Unseen classes +++++++++++++
unseen per class results:
+-------+-----+-----+
| Class | IoU | Acc |
+-------+-----+-----+
+-------+-----+-----+
Unseen Summary:
+-------+------+------+
|  aAcc | mIoU | mAcc |
+-------+------+------+
| 28.39 | nan  | nan  |

command is:

CHECKPOINT=weights/coco_fully_512_vit_base.pth
CONFIG_FILE=vpt_seg_fully_vit-b_512x512_80k_12_100_multi.py
python eval_zegclip_coco.py ${CONFIG_FILE} ${CHECKPOINT} --eval=mIoU --out $OUT_FILE