How to evaluate the pretrained linear classification heads on ImageNet?

Hi,

I tried to first run python run/eval/knn.py <args> and it worked, I was able to reproduce the results in the paper. But then I used tried to evaluate the 1-layer classification head via:

python dinov2/run/eval/linear.py \
    --config-file dinov2/configs/eval/vits14_pretrain.yaml \
    --pretrained-weights pretrained/dinov2_vits14_pretrain.pth \
    --classifier-fpath pretrained/dinov2_vits14_linear_head.pth \
    --output-dir outputs/linear/s/ \
    --no-resume \
    --train-dataset ImageNet:split=TRAIN:root=dataset:extra=dataset \
    --val-dataset ImageNet:split=VAL:root=dataset:extra=dataset

and it gives following log.err:

/lustre/fs1/home/cap6411.student12/dinov2/dinov2/layers/swiglu_ffn.py:43: UserWarning: xFormers is available (SwiGLU)
  warnings.warn("xFormers is available (SwiGLU)")
/lustre/fs1/home/cap6411.student12/dinov2/dinov2/layers/attention.py:27: UserWarning: xFormers is available (Attention)
  warnings.warn("xFormers is available (Attention)")
/lustre/fs1/home/cap6411.student12/dinov2/dinov2/layers/block.py:33: UserWarning: xFormers is available (Block)
  warnings.warn("xFormers is available (Block)")
submitit ERROR (2023-10-27 16:53:44,246) - Submitted job triggered an exception
Traceback (most recent call last):
  File "/home/cap6411.student12/my-envs/dinov2/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/cap6411.student12/my-envs/dinov2/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/cap6411.student12/my-envs/dinov2/lib/python3.9/site-packages/submitit/core/_submit.py", line 11, in <module>
    submitit_main()
  File "/home/cap6411.student12/my-envs/dinov2/lib/python3.9/site-packages/submitit/core/submission.py", line 76, in submitit_main
    process_job(args.folder)
  File "/home/cap6411.student12/my-envs/dinov2/lib/python3.9/site-packages/submitit/core/submission.py", line 69, in process_job
    raise error
  File "/home/cap6411.student12/my-envs/dinov2/lib/python3.9/site-packages/submitit/core/submission.py", line 55, in process_job
    result = delayed.result()
  File "/home/cap6411.student12/my-envs/dinov2/lib/python3.9/site-packages/submitit/core/utils.py", line 133, in result
    self._result = self.function(*self.args, **self.kwargs)
  File "/lustre/fs1/home/cap6411.student12/dinov2/dinov2/run/eval/linear.py", line 26, in __call__
    linear_main(self.args)
  File "/lustre/fs1/home/cap6411.student12/dinov2/dinov2/eval/linear.py", line 597, in main
    run_eval_linear(
  File "/lustre/fs1/home/cap6411.student12/dinov2/dinov2/eval/linear.py", line 521, in run_eval_linear
    start_iter = checkpointer.resume_or_load(classifier_fpath or "", resume=resume).get("iteration", -1) + 1
  File "/home/cap6411.student12/my-envs/dinov2/lib/python3.9/site-packages/fvcore/common/checkpoint.py", line 227, in resume_or_load
    return self.load(path, checkpointables=[])
  File "/home/cap6411.student12/my-envs/dinov2/lib/python3.9/site-packages/fvcore/common/checkpoint.py", line 156, in load
    incompatible = self._load_model(checkpoint)
  File "/home/cap6411.student12/my-envs/dinov2/lib/python3.9/site-packages/fvcore/common/checkpoint.py", line 272, in _load_model
    checkpoint_state_dict = checkpoint.pop("model")
KeyError: 'model'
srun: error: evc6: task 1: Exited with exit code 1
srun: error: evc6: task 0: Exited with exit code 1

and following log.out:

submitit INFO (2023-10-27 16:52:58,265) - Starting with JobEnvironment(job_id=273551, hostname=evc6, local_rank=0(2), node=0(1), global_rank=0(2))
submitit INFO (2023-10-27 16:52:58,266) - Loading pickle: /lustre/fs1/home/cap6411.student12/dinov2/outputs/linear/s/273551_submitted.pkl
I20231027 16:53:29 28950 dinov2 config.py:59] git:
  sha: da4b3825f0ed64b7398ace00c5062503811d0cff, status: has uncommitted changes, branch: main

I20231027 16:53:29 28950 dinov2 config.py:60] batch_size: 128
classifier_fpath: pretrained/dinov2_vits14_linear_head.pth
comment: 
config_file: dinov2/configs/eval/vits14_pretrain.yaml
epoch_length: 1250
epochs: 10
eval_period_iterations: 1250
exclude: 
learning_rates: [1e-05, 2e-05, 5e-05, 0.0001, 0.0002, 0.0005, 0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.1]
ngpus: 2
no_resume: True
nodes: 1
num_workers: 8
opts: ['train.output_dir=/lustre/fs1/home/cap6411.student12/dinov2/outputs/linear/s']
output_dir: /lustre/fs1/home/cap6411.student12/dinov2/outputs/linear/s
partition: normal
pretrained_weights: pretrained/dinov2_vits14_pretrain.pth
save_checkpoint_frequency: 20
test_class_mapping_fpaths: [None]
test_dataset_strs: None
test_metric_types: None
timeout: 180
train_dataset_str: ImageNet:split=TRAIN:root=dataset:extra=dataset
use_volta32: True
val_class_mapping_fpath: None
val_dataset_str: ImageNet:split=VAL:root=dataset:extra=dataset
val_metric_type: mean_accuracy
I20231027 16:53:29 28950 dinov2 config.py:26] sqrt scaling learning rate; base: 0.004, new: 0.0014142135623730952
I20231027 16:53:29 28950 dinov2 config.py:33] MODEL:
  WEIGHTS: ''
compute_precision:
  grad_scaler: true
  teacher:
    backbone:
      sharding_strategy: SHARD_GRAD_OP
      mixed_precision:
        param_dtype: fp16
        reduce_dtype: fp16
        buffer_dtype: fp32
    dino_head:
      sharding_strategy: SHARD_GRAD_OP
      mixed_precision:
        param_dtype: fp16
        reduce_dtype: fp16
        buffer_dtype: fp32
    ibot_head:
      sharding_strategy: SHARD_GRAD_OP
      mixed_precision:
        param_dtype: fp16
        reduce_dtype: fp16
        buffer_dtype: fp32
  student:
    backbone:
      sharding_strategy: SHARD_GRAD_OP
      mixed_precision:
        param_dtype: fp16
        reduce_dtype: fp16
        buffer_dtype: fp32
    dino_head:
      sharding_strategy: SHARD_GRAD_OP
      mixed_precision:
        param_dtype: fp16
        reduce_dtype: fp32
        buffer_dtype: fp32
    ibot_head:
      sharding_strategy: SHARD_GRAD_OP
      mixed_precision:
        param_dtype: fp16
        reduce_dtype: fp32
        buffer_dtype: fp32
dino:
  loss_weight: 1.0
  head_n_prototypes: 65536
  head_bottleneck_dim: 256
  head_nlayers: 3
  head_hidden_dim: 2048
  koleo_loss_weight: 0.1
ibot:
  loss_weight: 1.0
  mask_sample_probability: 0.5
  mask_ratio_min_max:
  - 0.1
  - 0.5
  separate_head: false
  head_n_prototypes: 65536
  head_bottleneck_dim: 256
  head_nlayers: 3
  head_hidden_dim: 2048
train:
  batch_size_per_gpu: 64
  dataset_path: ImageNet:split=TRAIN
  output_dir: /lustre/fs1/home/cap6411.student12/dinov2/outputs/linear/s
  saveckp_freq: 20
  seed: 0
  num_workers: 10
  OFFICIAL_EPOCH_LENGTH: 1250
  cache_dataset: true
  centering: centering
student:
  arch: vit_small
  patch_size: 14
  drop_path_rate: 0.3
  layerscale: 1.0e-05
  drop_path_uniform: true
  pretrained_weights: ''
  ffn_layer: mlp
  block_chunks: 0
  qkv_bias: true
  proj_bias: true
  ffn_bias: true
  num_register_tokens: 0
  interpolate_antialias: false
  interpolate_offset: 0.1
teacher:
  momentum_teacher: 0.992
  final_momentum_teacher: 1
  warmup_teacher_temp: 0.04
  teacher_temp: 0.07
  warmup_teacher_temp_epochs: 30
optim:
  epochs: 100
  weight_decay: 0.04
  weight_decay_end: 0.4
  base_lr: 0.004
  lr: 0.0014142135623730952
  warmup_epochs: 10
  min_lr: 1.0e-06
  clip_grad: 3.0
  freeze_last_layer_epochs: 1
  scaling_rule: sqrt_wrt_1024
  patch_embed_lr_mult: 0.2
  layerwise_decay: 0.9
  adamw_beta1: 0.9
  adamw_beta2: 0.999
crops:
  global_crops_scale:
  - 0.32
  - 1.0
  local_crops_number: 8
  local_crops_scale:
  - 0.05
  - 0.32
  global_crops_size: 518
  local_crops_size: 98
evaluation:
  eval_period_iterations: 12500

I20231027 16:53:29 28950 dinov2 vision_transformer.py:122] using MLP layer as FFN
I20231027 16:53:39 28950 dinov2 utils.py:33] Pretrained weights found at pretrained/dinov2_vits14_pretrain.pth and loaded with msg: <All keys matched successfully>
I20231027 16:53:39 28950 dinov2 loaders.py:84] using dataset: "ImageNet:split=TRAIN:root=dataset:extra=dataset"
I20231027 16:53:39 28950 dinov2 loaders.py:89] # of dataset samples: 1,281,167
I20231027 16:53:44 28950 fvcore.common.checkpoint checkpoint.py:150] [Checkpointer] Loading from pretrained/dinov2_vits14_linear_head.pth ...
submitit ERROR (2023-10-27 16:53:44,246) - Submitted job triggered an exception
E20231027 16:53:44 28950 submitit submission.py:68] Submitted job triggered an exception

I assumed that since the dinvo2 feature model is there and classification weights are also given I should be able to load the classifer model and eval.

And one more thing, if I remove the --classifier-fpath pretrained/dinov2_vits14_linear_head.pth it says no checkpoint found and starts training the model, even though the script is supposed to be for evaluation (as the path suggests run/eval/linear.py.

facebookresearch / dinov2

How to evaluate the pretrained linear classification heads on ImageNet? #284