Questions about ZegCLIP training

ZiqinZhou66 / ZegCLIP

Official implement of CVPR2023 ZegCLIP: Towards Adapting CLIP for Zero-shot Semantic Segmentation

MIT License

195 stars 17 forks source link

Questions about ZegCLIP training #7

Closed Qyizos closed 1 year ago

Qyizos commented 1 year ago

I am very happy to see your work on ZegCLIP, it is very interesting and very helpful for me. I'm having a little trouble with your code.

I used the pth file you provided for inference and got results consistent with the paper. However, I use the same docker environment and source code for training, but there is a certain deviation in the inference results obtained.When running Inductive setting under the VOC dataset, my experimental results differ from yours by less than 2 points. However, when running Inductive setting under the COCO dataset, there is a difference of as much as 7 points. The experimental results are shown in the attachment.

Can you help answer this question?

COCO_Inductive VOC_Inductive

ZiqinZhou66 commented 1 year ago

I appreciate your interest in our work.

Could you please confirm that the mask weight you used in https://github.com/ZiqinZhou66/ZegCLIP/blob/aa63859603e6b243c1f8f3722bb4e4715aa0a422/configs/coco/vpt_seg_fully_vit-b_512x512_80k_12_100_multi.py#L71 is 20 or 100? Because I noticed that the parameter in the original config is 20, but I do use 100 for training so I uploaded a new config. It may affect the performance if used the old version.

Besides, a widely used trick in many previous works that slightly reduces the logits on seen classes is helpful in the inductive setting. I have set the parameter to 0.1. Did it also use in your inference? Or you may try to change the factor to see the difference.

Qyizos commented 1 year ago

Thank for your eager reply, this phenomenon is caused by the fact that I ignored the number of iterations of the model.

In the MMSeg, batchSize = GPUNum * samples_per_gpu. Your paper has mentioned that it is using 4 GPUs for training, and I ignored this condition. I was using only a single card, so the amount of training data was only 1/4 of yours. After I made up the full number of training sessions, the method performance was significantly improved.

However, it is still slightly less effective than the paper by 1-2 points. I think this is because the number of trainings increased exponentially and I didn't change the super parameters like learning rate accordingly.

Thank you once again!

ZiqinZhou66 commented 1 year ago

Thank you for your feedback. I may need to be more specific that making sure the batch size is 16 when reproducing the effect of our work. Best of luck with your research.

aliman80 commented 1 year ago

@Qyizos Hi, I am trying to validate the results for cocostuff164k but i got very good results for 11 classes but for all the rest its zero. Can you guide what am i missing here. I run the code with only updated datapath and rest of repo code was same