Closed Gabrysse closed 4 months ago
Hi, The scores on the paper are based on ViT-L/14@336px, and all other experiments on the paper are with the ViT-L variant without further specification. You might want to try with ViT-L with its config to reproduce the results in the paper, and it's hard for us to validate your results as we haven't tried with ViT-B.
Thanks!
Hi, I tried to train your model with CLIP frozen. In Table 6 of your paper, you reported that the model achieved 10.4 on ADE847. Unfortunately, after training your model with CLIP frozen, I obtained ~4.6 mIoU on ADE847. I'm using the following command to do the training:
Am I missing something?