Thanks for your work! I’m pretraining the vit-tiny for my own dataset, but i can not determine the setting for decoder's parameters (depth/embed_dim/num_heads), just consistent with vit-base/large/huge or choose some smaller value to make a lightweight decoder?
Thanks for your work! I’m pretraining the vit-tiny for my own dataset, but i can not determine the setting for decoder's parameters (depth/embed_dim/num_heads), just consistent with vit-base/large/huge or choose some smaller value to make a lightweight decoder?