Closed zankerx closed 1 year ago
Hi @zankerx, that's correct, there is no CLS token, we just average pool the outputs!
Hi @MidoAssran , I am trying with Vit-B14 for classification task, my output from transformer have shape nx256x768, so what dim should I use average? reduce dim to nx256 or nx768 ?
Hi, I have a question about linear probing, I haven't seen a CLS token. Is the classification performed directly on all of the outputs (which makes a lot of parameters for a single layer) or on an average of the outputs ? Thx !