Closed yhyang-myron closed 1 year ago
Hey @yhyang-myron,
I understand your confusion, maybe I didn't phrase it properly in the paper. CLIP only refers to the pretraining stage, where we used the text features for anchoring, and after that we used the standard finetuning method with unweighted cross entropy loss. So the only part means we didn't use the class-balanced focal loss or the instance sampling for tail categories.
Hope this clears it up, but let me know if you have any more questions!
Cheers, David
I see, thank you!
Hi, I see there is a result of clip only, which is 27.73 miou. How is this result trained?Is it the result of pretrain stage or adding the fine-tune methods in it? Thanks a lot!