Ackesnal / GTP-ViT

This is the official code for paper: Token Summarisation for Efficient Vision Transformers via Graph-based Token Propagation
Apache License 2.0
24 stars 1 forks source link

Pretrained model #1

Open kangkang189 opened 10 months ago

kangkang189 commented 10 months ago

Pretrained model

Ackesnal commented 10 months ago

Hi there,

We do not have pretrained model weights. This method works as a pluggable component to existing ViT backbones. For example, you can simply download the DeiT-Small's model weight at https://dl.fbaipublicfiles.com/deit/deit_small_patch16_224-cd65a155.pth and apply our methods on DeiT-Small.

You can refer to the end of models_v3.py to find all the backbones we support at the moment and download their model weights by yourself. These weights are usually available on HuggingFace.

I'll provide more details on how to use our method in README later. I was unfortunately injured and underwent surgery on the right shoulder two weeks ago. My right arm and hand are fully stabilized in a sling at present, so the README will probably take a longer time to be updated.

Kind regards, Xuwei

tanvir-utexas commented 6 months ago

I am not sure why I am always getting 0 accuracy on the models listed in models_v3.py. I tried AugReg models. Should it work without pre-training?

Update: I was able to make it run. Thanks!

Ackesnal commented 6 months ago

Hi @tanvir-utexas,

You will need to use the corresponding pre-trained weights since these models might utilize different preprocessing methods. Most of the models can be downloaded from HuggingFace.

Cheers, Xuwei