Verg-Avesta / CounTR

CounTR: Transformer-based Generalised Visual Counting
https://verg-avesta.github.io/CounTR_Webpage/
MIT License
92 stars 9 forks source link

Other Architecture #14

Closed jaideep11061982 closed 1 year ago

jaideep11061982 commented 1 year ago

Hi @Verg-Avesta 1) What other architecture can we use with demo.py from the list below . 2) Do we have pretrained weights for all below architectures in the list ? 3) Did you try with using Vit small , that uses patch size of 32 ?


# set recommended archs
mae_vit_base_patch16 = mae_vit_base_patch16_dec512d8b  
mae_vit_base4_patch16 = mae_vit_base_patch16_fim4 # decoder: 4 blocks
mae_vit_base6_patch16 = mae_vit_base_patch16_fim6 # decoder: 6 blocks
mae_vit_large_patch16 = mae_vit_large_patch16_dec512d8b  
mae_vit_huge_patch14 = mae_vit_huge_patch14_dec512d8b  
Verg-Avesta commented 1 year ago
  1. All architectures can be used, but you need to pre-train and fine-tune these networks yourself.
  2. No, you can try to pre-train and fine-tune other architectures yourself.
  3. Here "patch16" means that the image patch size is 16x16. I didn't try the patch size of 32 for it's too large for a singe RTX3090.
jaideep11061982 commented 1 year ago

@Verg-Avesta are we expected to get better results with heavy architectures that you have listed . They differ in embedding size & no of layers i suppose
mae_vit_large_patch16_dec512d8b

Verg-Avesta commented 1 year ago

There may be a small boost in results. But due to the small FSC-147 dataset and the difficulty of training large models, I don't think the results will be improved greatly.