adaptivetokensampling / ATS

Adaptive Token Sampling for Efficient Vision Transformers (ECCV 2022 Oral Presentation)
https://adaptivetokensampling.github.io/
Apache License 2.0
87 stars 13 forks source link

Reproducing results Fig.5 #3

Open chloeskt opened 1 year ago

chloeskt commented 1 year ago

Hello,

I am currently trying to reproduce the results given in Fig 5 (b) and (c), in "not finetuned" mode.

Here is my conf for the GFLOPs level of 3, Stage 3 not finetuned:

TRAIN:
  ENABLE: False

TEST:
  ENABLE: True
  DATASET: ImageNet
  BATCH_SIZE: 1024
  CHECKPOINT_FILE_PATH: "/root/workspace/projects/ATS/models/deit_small_patch16_224-cd65a155.pth"
  NUM_ENSEMBLE_VIEWS: 1
  NUM_SPATIAL_CROPS: 1
  SAVE_RESULTS_PATH: "/root/no_backup/preds_ats.pkl"

DATA:
  PATH_TO_DATA_DIR: "/datasets_local/ImageNet/"
  TEST_CROP_SIZE: 224
  TRAIN_CROP_SIZE: 224
  MEAN: [0.485, 0.456, 0.406]
  STD: [0.229, 0.224, 0.225]

DATA_LOADER:
  NUM_WORKERS: 2

VIT:
  IMG_SIZE: 224
  PATCH_SIZE: 16
  IN_CHANNELS: 3
  NUM_CLASSES: 1000
  EMBED_DIM: 384
  DEPTH: 12
  NUM_HEADS: 6
  MLP_RATIO: 4.0
  QKV_BIAS: True
  QK_SCALE: None
  REPRESENTATION_SIZE: None
  DROP_RATE: 0.0
  ATTN_DROP_RATE: 0.0
  DROP_PATH_RATE: 0.0
  HYBRID_BACKBONE: None
  NORM_LAYER: None
  ATS_BLOCKS: [3]
  NUM_TOKENS: [108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108]
  DROP_TOKENS: True

NUM_GPUS: 1

And here is my conf for the GFLOPs level of 3, Multi-stage not finetuned:

TRAIN:
  ENABLE: False

TEST:
  ENABLE: True
  DATASET: ImageNet
  BATCH_SIZE: 1024
  CHECKPOINT_FILE_PATH: "/root/workspace/projects/ATS/models/deit_small_patch16_224-cd65a155.pth"
  NUM_ENSEMBLE_VIEWS: 1
  NUM_SPATIAL_CROPS: 1
  SAVE_RESULTS_PATH: "/root/no_backup/preds_ats.pkl"

DATA:
  PATH_TO_DATA_DIR: "/datasets_local/ImageNet/"
  TEST_CROP_SIZE: 224
  TRAIN_CROP_SIZE: 224
  MEAN: [0.485, 0.456, 0.406]
  STD: [0.229, 0.224, 0.225]

DATA_LOADER:
  NUM_WORKERS: 2

VIT:
  IMG_SIZE: 224
  PATCH_SIZE: 16
  IN_CHANNELS: 3
  NUM_CLASSES: 1000
  EMBED_DIM: 384
  DEPTH: 12
  NUM_HEADS: 6
  MLP_RATIO: 4.0
  QKV_BIAS: True
  QK_SCALE: None
  REPRESENTATION_SIZE: None
  DROP_RATE: 0.0
  ATTN_DROP_RATE: 0.0
  DROP_PATH_RATE: 0.0
  HYBRID_BACKBONE: None
  NORM_LAYER: None
  ATS_BLOCKS: [3, 4, 5, 6, 7, 8, 9, 10, 11]
  NUM_TOKENS: [108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108]
  DROP_TOKENS: True

NUM_GPUS: 1

However I am not able to reach the Top1-Accuracy you indicate in these figures. Could you please provide the config files leading to the creation of Fig5 (b) and (c) please ?

Thank you in advance !

cuguniang commented 1 year ago

1 replace n_ref_tokens with n_tokens

ys = self.create_ys(normalized_cdf, n_tokens).unsqueeze(
            dim=2
        ) 

2 replace N with n_tokens unique_indices = self.get_unique_indices(indices=tokens_to_pick_ind, max_value=n_tokens - 1)[:, : n_tokens - 1]

chloeskt commented 1 year ago

Thanks for your answer !

yutoby commented 9 months ago

Hi, @chloeskt . I replaced the code in ats_block.py as @cuguniang's comment. Though the code could run without any errors, I still could not reproduce the results given in Fig 5 (c).

I ran the testing with batchsize = 1, 1024, and NUM_TOKENS= 1.0, 0.87, 171 (for NUM_TOKENS <= 1.0, it would be view as ratio). And the results are: batchsize=1, NUM_TOKENS=1.0, top1_acc = 77.05% batchsize=1, NUM_TOKENS=171, top1_acc = 74.52% batchsize=1024, NUM_TOKENS=1.0, top1_acc = 78.34% batchsize=1024, NUM_TOKENS=0.87, top1_acc = 58.75%

Have you sucessfully reproduced the result? Thank you.