facebookresearch / ov-seg

This is the official PyTorch implementation of the paper Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP.
Other
689 stars 61 forks source link

Reproducing the results of baseline w/ original CLIP #20

Closed zifuwan closed 1 year ago

zifuwan commented 1 year ago

Dear author,

Thanks for the great work! Could you tell me how many epochs are needed to train baseline w/ original CLIP? I've trained 2 epochs(10,000 iters), and the loss seems to have converged already. However, the testing results have a gap with your results( I have tested the weight you provided and the result was 29.6 on ADE-150 which was perfect. However, my result is only 18.0, and according to your paper, it should be 21.8). Could you help me out?

Here are my training results and my training logs: 06bc78c3cfd3fb8316f059e9668d746 image

Thanks.

Jeff-LiangF commented 1 year ago

Hi, thanks for your questions. I just checked my experiment records. 60k~120k training iters are usually good choices, so your model may be undertrained. Moreover, the performance is also sensitive to CLIP_ENSEMBLE_WEIGHT , you may also want to tune it a little bit to achieve the best results.

Thanks! I closed the issue. Feel free to reopen it if you have further questions.