facebookresearch / dinov2

PyTorch code and models for the DINOv2 self-supervised learning method.
Apache License 2.0
8.85k stars 774 forks source link

How to evaluate the pretrained linear classification heads on ImageNet? #284

Closed m-shahbaz-kharal closed 10 months ago

m-shahbaz-kharal commented 11 months ago

Hi,

I tried to first run python run/eval/knn.py <args> and it worked, I was able to reproduce the results in the paper. But then I used tried to evaluate the 1-layer classification head via:

python dinov2/run/eval/linear.py \
    --config-file dinov2/configs/eval/vits14_pretrain.yaml \
    --pretrained-weights pretrained/dinov2_vits14_pretrain.pth \
    --classifier-fpath pretrained/dinov2_vits14_linear_head.pth \
    --output-dir outputs/linear/s/ \
    --no-resume \
    --train-dataset ImageNet:split=TRAIN:root=dataset:extra=dataset \
    --val-dataset ImageNet:split=VAL:root=dataset:extra=dataset

and it gives following log.err:

/lustre/fs1/home/cap6411.student12/dinov2/dinov2/layers/swiglu_ffn.py:43: UserWarning: xFormers is available (SwiGLU)
  warnings.warn("xFormers is available (SwiGLU)")
/lustre/fs1/home/cap6411.student12/dinov2/dinov2/layers/attention.py:27: UserWarning: xFormers is available (Attention)
  warnings.warn("xFormers is available (Attention)")
/lustre/fs1/home/cap6411.student12/dinov2/dinov2/layers/block.py:33: UserWarning: xFormers is available (Block)
  warnings.warn("xFormers is available (Block)")
submitit ERROR (2023-10-27 16:53:44,246) - Submitted job triggered an exception
Traceback (most recent call last):
  File "/home/cap6411.student12/my-envs/dinov2/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/cap6411.student12/my-envs/dinov2/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/cap6411.student12/my-envs/dinov2/lib/python3.9/site-packages/submitit/core/_submit.py", line 11, in <module>
    submitit_main()
  File "/home/cap6411.student12/my-envs/dinov2/lib/python3.9/site-packages/submitit/core/submission.py", line 76, in submitit_main
    process_job(args.folder)
  File "/home/cap6411.student12/my-envs/dinov2/lib/python3.9/site-packages/submitit/core/submission.py", line 69, in process_job
    raise error
  File "/home/cap6411.student12/my-envs/dinov2/lib/python3.9/site-packages/submitit/core/submission.py", line 55, in process_job
    result = delayed.result()
  File "/home/cap6411.student12/my-envs/dinov2/lib/python3.9/site-packages/submitit/core/utils.py", line 133, in result
    self._result = self.function(*self.args, **self.kwargs)
  File "/lustre/fs1/home/cap6411.student12/dinov2/dinov2/run/eval/linear.py", line 26, in __call__
    linear_main(self.args)
  File "/lustre/fs1/home/cap6411.student12/dinov2/dinov2/eval/linear.py", line 597, in main
    run_eval_linear(
  File "/lustre/fs1/home/cap6411.student12/dinov2/dinov2/eval/linear.py", line 521, in run_eval_linear
    start_iter = checkpointer.resume_or_load(classifier_fpath or "", resume=resume).get("iteration", -1) + 1
  File "/home/cap6411.student12/my-envs/dinov2/lib/python3.9/site-packages/fvcore/common/checkpoint.py", line 227, in resume_or_load
    return self.load(path, checkpointables=[])
  File "/home/cap6411.student12/my-envs/dinov2/lib/python3.9/site-packages/fvcore/common/checkpoint.py", line 156, in load
    incompatible = self._load_model(checkpoint)
  File "/home/cap6411.student12/my-envs/dinov2/lib/python3.9/site-packages/fvcore/common/checkpoint.py", line 272, in _load_model
    checkpoint_state_dict = checkpoint.pop("model")
KeyError: 'model'
srun: error: evc6: task 1: Exited with exit code 1
srun: error: evc6: task 0: Exited with exit code 1

and following log.out:

submitit INFO (2023-10-27 16:52:58,265) - Starting with JobEnvironment(job_id=273551, hostname=evc6, local_rank=0(2), node=0(1), global_rank=0(2))
submitit INFO (2023-10-27 16:52:58,266) - Loading pickle: /lustre/fs1/home/cap6411.student12/dinov2/outputs/linear/s/273551_submitted.pkl
I20231027 16:53:29 28950 dinov2 config.py:59] git:
  sha: da4b3825f0ed64b7398ace00c5062503811d0cff, status: has uncommitted changes, branch: main

I20231027 16:53:29 28950 dinov2 config.py:60] batch_size: 128
classifier_fpath: pretrained/dinov2_vits14_linear_head.pth
comment: 
config_file: dinov2/configs/eval/vits14_pretrain.yaml
epoch_length: 1250
epochs: 10
eval_period_iterations: 1250
exclude: 
learning_rates: [1e-05, 2e-05, 5e-05, 0.0001, 0.0002, 0.0005, 0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.1]
ngpus: 2
no_resume: True
nodes: 1
num_workers: 8
opts: ['train.output_dir=/lustre/fs1/home/cap6411.student12/dinov2/outputs/linear/s']
output_dir: /lustre/fs1/home/cap6411.student12/dinov2/outputs/linear/s
partition: normal
pretrained_weights: pretrained/dinov2_vits14_pretrain.pth
save_checkpoint_frequency: 20
test_class_mapping_fpaths: [None]
test_dataset_strs: None
test_metric_types: None
timeout: 180
train_dataset_str: ImageNet:split=TRAIN:root=dataset:extra=dataset
use_volta32: True
val_class_mapping_fpath: None
val_dataset_str: ImageNet:split=VAL:root=dataset:extra=dataset
val_metric_type: mean_accuracy
I20231027 16:53:29 28950 dinov2 config.py:26] sqrt scaling learning rate; base: 0.004, new: 0.0014142135623730952
I20231027 16:53:29 28950 dinov2 config.py:33] MODEL:
  WEIGHTS: ''
compute_precision:
  grad_scaler: true
  teacher:
    backbone:
      sharding_strategy: SHARD_GRAD_OP
      mixed_precision:
        param_dtype: fp16
        reduce_dtype: fp16
        buffer_dtype: fp32
    dino_head:
      sharding_strategy: SHARD_GRAD_OP
      mixed_precision:
        param_dtype: fp16
        reduce_dtype: fp16
        buffer_dtype: fp32
    ibot_head:
      sharding_strategy: SHARD_GRAD_OP
      mixed_precision:
        param_dtype: fp16
        reduce_dtype: fp16
        buffer_dtype: fp32
  student:
    backbone:
      sharding_strategy: SHARD_GRAD_OP
      mixed_precision:
        param_dtype: fp16
        reduce_dtype: fp16
        buffer_dtype: fp32
    dino_head:
      sharding_strategy: SHARD_GRAD_OP
      mixed_precision:
        param_dtype: fp16
        reduce_dtype: fp32
        buffer_dtype: fp32
    ibot_head:
      sharding_strategy: SHARD_GRAD_OP
      mixed_precision:
        param_dtype: fp16
        reduce_dtype: fp32
        buffer_dtype: fp32
dino:
  loss_weight: 1.0
  head_n_prototypes: 65536
  head_bottleneck_dim: 256
  head_nlayers: 3
  head_hidden_dim: 2048
  koleo_loss_weight: 0.1
ibot:
  loss_weight: 1.0
  mask_sample_probability: 0.5
  mask_ratio_min_max:
  - 0.1
  - 0.5
  separate_head: false
  head_n_prototypes: 65536
  head_bottleneck_dim: 256
  head_nlayers: 3
  head_hidden_dim: 2048
train:
  batch_size_per_gpu: 64
  dataset_path: ImageNet:split=TRAIN
  output_dir: /lustre/fs1/home/cap6411.student12/dinov2/outputs/linear/s
  saveckp_freq: 20
  seed: 0
  num_workers: 10
  OFFICIAL_EPOCH_LENGTH: 1250
  cache_dataset: true
  centering: centering
student:
  arch: vit_small
  patch_size: 14
  drop_path_rate: 0.3
  layerscale: 1.0e-05
  drop_path_uniform: true
  pretrained_weights: ''
  ffn_layer: mlp
  block_chunks: 0
  qkv_bias: true
  proj_bias: true
  ffn_bias: true
  num_register_tokens: 0
  interpolate_antialias: false
  interpolate_offset: 0.1
teacher:
  momentum_teacher: 0.992
  final_momentum_teacher: 1
  warmup_teacher_temp: 0.04
  teacher_temp: 0.07
  warmup_teacher_temp_epochs: 30
optim:
  epochs: 100
  weight_decay: 0.04
  weight_decay_end: 0.4
  base_lr: 0.004
  lr: 0.0014142135623730952
  warmup_epochs: 10
  min_lr: 1.0e-06
  clip_grad: 3.0
  freeze_last_layer_epochs: 1
  scaling_rule: sqrt_wrt_1024
  patch_embed_lr_mult: 0.2
  layerwise_decay: 0.9
  adamw_beta1: 0.9
  adamw_beta2: 0.999
crops:
  global_crops_scale:
  - 0.32
  - 1.0
  local_crops_number: 8
  local_crops_scale:
  - 0.05
  - 0.32
  global_crops_size: 518
  local_crops_size: 98
evaluation:
  eval_period_iterations: 12500

I20231027 16:53:29 28950 dinov2 vision_transformer.py:122] using MLP layer as FFN
I20231027 16:53:39 28950 dinov2 utils.py:33] Pretrained weights found at pretrained/dinov2_vits14_pretrain.pth and loaded with msg: <All keys matched successfully>
I20231027 16:53:39 28950 dinov2 loaders.py:84] using dataset: "ImageNet:split=TRAIN:root=dataset:extra=dataset"
I20231027 16:53:39 28950 dinov2 loaders.py:89] # of dataset samples: 1,281,167
I20231027 16:53:44 28950 fvcore.common.checkpoint checkpoint.py:150] [Checkpointer] Loading from pretrained/dinov2_vits14_linear_head.pth ...
submitit ERROR (2023-10-27 16:53:44,246) - Submitted job triggered an exception
E20231027 16:53:44 28950 submitit submission.py:68] Submitted job triggered an exception

I assumed that since the dinvo2 feature model is there and classification weights are also given I should be able to load the classifer model and eval.

And one more thing, if I remove the --classifier-fpath pretrained/dinov2_vits14_linear_head.pth it says no checkpoint found and starts training the model, even though the script is supposed to be for evaluation (as the path suggests run/eval/linear.py.

patricklabatut commented 10 months ago

The evaluation actually trains classifiers to evaluate them. To evaluate a specific pretrained classifier (like the provided models), one could either adapt the code to bypass that training and classifier selection to only run the corresponding pure evaluation code. Or it's perhaps just simpler to manually run inference + compute accuracy on the validation set of ImageNet.