In this case, the number of parameters in the classification header will become very large, occupying the majority of learnable parameters. For example, using the promp_token = 30 mentioned in the paper, when I implement it using PyTorch, the model I use is vit_base, and the number of learnable parameters reaches 30M.
In this case, the number of parameters in the classification header will become very large, occupying the majority of learnable parameters. For example, using the promp_token = 30 mentioned in the paper, when I implement it using PyTorch, the model I use is vit_base, and the number of learnable parameters reaches 30M.