facebookresearch / dinov2

PyTorch code and models for the DINOv2 self-supervised learning method.
Apache License 2.0
9.2k stars 820 forks source link

Cannot reproduce the results of linear ViT-S/14 81.1%, could you please release all the training logs like DINOv1 ? #106

Open nemonameless opened 1 year ago

nemonameless commented 1 year ago
python dinov2/eval/linear.py \
    --config-file dinov2/configs/eval/vits14_pretrain.yaml \
    --pretrained-weights ./models/dinov2_vits14_pretrain.pth \
    --output-dir outputs \
    --train-dataset ImageNet:split=TRAIN:root=./dataset/ILSVRC2012_1k:extra=./dataset/ILSVRC2012_1k \
    --val-dataset ImageNet:split=VAL:root=./dataset/ILSVRC2012_1k:extra=./dataset/ILSVRC2012_1k

No code changes, I only got 80.0% but not 81.1% as in README. And the logs in outputs/results_eval_linear.json is :

iter: 1249
{"best_classifier": {"name": "classifier_1_blocks_avgpool_False_lr_0_01000", "accuracy": 0.7483000159263611}}

iter: 2499
{"best_classifier": {"name": "classifier_1_blocks_avgpool_False_lr_0_00500", "accuracy": 0.7661600112915039}}

iter: 3749
{"best_classifier": {"name": "classifier_1_blocks_avgpool_False_lr_0_00500", "accuracy": 0.7732399702072144}}

iter: 4999
{"best_classifier": {"name": "classifier_1_blocks_avgpool_True_lr_0_00500", "accuracy": 0.7784199714660645}}

iter: 6249
{"best_classifier": {"name": "classifier_1_blocks_avgpool_True_lr_0_00500", "accuracy": 0.7860599756240845}}

iter: 7499
{"best_classifier": {"name": "classifier_1_blocks_avgpool_True_lr_0_00500", "accuracy": 0.7884399890899658}}

iter: 8749
{"best_classifier": {"name": "classifier_1_blocks_avgpool_True_lr_0_01000", "accuracy": 0.7938200235366821}}

iter: 9999
{"best_classifier": {"name": "classifier_1_blocks_avgpool_True_lr_0_01000", "accuracy": 0.7961999773979187}}

iter: 11249
{"best_classifier": {"name": "classifier_1_blocks_avgpool_True_lr_0_01000", "accuracy": 0.7994800209999084}}

iter: 12500
{"best_classifier": {"name": "classifier_1_blocks_avgpool_True_lr_0_01000", "accuracy": 0.8002399802207947}}

so your released linear head weight was selected from those linear weights with different block num or different lr?

Hope you could release all the training logs as you did in https://github.com/facebookresearch/dino, including pre-train and linear training stages

Thanks.

EddieAy commented 1 year ago
python dinov2/eval/linear.py \
    --config-file dinov2/configs/eval/vits14_pretrain.yaml \
    --pretrained-weights ./models/dinov2_vits14_pretrain.pth \
    --output-dir outputs \
    --train-dataset ImageNet:split=TRAIN:root=./dataset/ILSVRC2012_1k:extra=./dataset/ILSVRC2012_1k \
    --val-dataset ImageNet:split=VAL:root=./dataset/ILSVRC2012_1k:extra=./dataset/ILSVRC2012_1k

No code changes, I only got 80.0% but not 81.1% as in README. And the logs in outputs/results_eval_linear.json is :

iter: 1249
{"best_classifier": {"name": "classifier_1_blocks_avgpool_False_lr_0_01000", "accuracy": 0.7483000159263611}}

iter: 2499
{"best_classifier": {"name": "classifier_1_blocks_avgpool_False_lr_0_00500", "accuracy": 0.7661600112915039}}

iter: 3749
{"best_classifier": {"name": "classifier_1_blocks_avgpool_False_lr_0_00500", "accuracy": 0.7732399702072144}}

iter: 4999
{"best_classifier": {"name": "classifier_1_blocks_avgpool_True_lr_0_00500", "accuracy": 0.7784199714660645}}

iter: 6249
{"best_classifier": {"name": "classifier_1_blocks_avgpool_True_lr_0_00500", "accuracy": 0.7860599756240845}}

iter: 7499
{"best_classifier": {"name": "classifier_1_blocks_avgpool_True_lr_0_00500", "accuracy": 0.7884399890899658}}

iter: 8749
{"best_classifier": {"name": "classifier_1_blocks_avgpool_True_lr_0_01000", "accuracy": 0.7938200235366821}}

iter: 9999
{"best_classifier": {"name": "classifier_1_blocks_avgpool_True_lr_0_01000", "accuracy": 0.7961999773979187}}

iter: 11249
{"best_classifier": {"name": "classifier_1_blocks_avgpool_True_lr_0_01000", "accuracy": 0.7994800209999084}}

iter: 12500
{"best_classifier": {"name": "classifier_1_blocks_avgpool_True_lr_0_01000", "accuracy": 0.8002399802207947}}

so your released linear head weight was selected from those linear weights with different block num or different lr?

Hope you could release all the training logs as you did in https://github.com/facebookresearch/dino, including pre-train and linear training stages

Thanks.

How do you get these files extra=./dataset/ILSVRC2012_1k Like class-ids-TRAIN.npy class-names-TRAIN.npy

XiaohuJoshua commented 1 year ago

I'm having a similar problem with provided pretrained vit-l/14 weights, of which linear eval not getting 86.3% as in the readme, only 85.52%, is there a difference between the script dinov2/eval/linear.py and the script dinov2/run/eval/linear.py?

here is my script:

python dinov2/eval/linear.py \
    --config-file dinov2/configs/eval/vitl14_pretrain.yaml \
    --pretrained-weights ./models/dinov2_vitl14_pretrain.pth \
    --output-dir outputs \
    --train-dataset ImageNet:split=TRAIN:root=./dataset/ILSVRC2012_1k:extra=./dataset/ILSVRC2012_1k \
    --val-dataset ImageNet:split=VAL:root=./dataset/ILSVRC2012_1k:extra=./dataset/ILSVRC2012_1k

Logs are as follows:

iter: 1249
{"best_classifier": {"name": "classifier_1_blocks_avgpool_True_lr_0_01000", "accuracy": 0.8141999840736389}}

iter: 2499
{"best_classifier": {"name": "classifier_1_blocks_avgpool_True_lr_0_00500", "accuracy": 0.8271999955177307}}

iter: 3749
{"best_classifier": {"name": "classifier_1_blocks_avgpool_True_lr_0_00500", "accuracy": 0.8337000012397766}}

iter: 4999
{"best_classifier": {"name": "classifier_1_blocks_avgpool_True_lr_0_00500", "accuracy": 0.8385400176048279}}

iter: 6249
{"best_classifier": {"name": "classifier_1_blocks_avgpool_True_lr_0_00500", "accuracy": 0.8428599834442139}}

iter: 7499
{"best_classifier": {"name": "classifier_1_blocks_avgpool_True_lr_0_01000", "accuracy": 0.8461199998855591}}

iter: 8749
{"best_classifier": {"name": "classifier_1_blocks_avgpool_True_lr_0_01000", "accuracy": 0.8496999740600586}}

iter: 9999
{"best_classifier": {"name": "classifier_4_blocks_avgpool_True_lr_0_01000", "accuracy": 0.8521999716758728}}

iter: 11249
{"best_classifier": {"name": "classifier_4_blocks_avgpool_True_lr_0_01000", "accuracy": 0.8547199964523315}}

iter: 12500
{"best_classifier": {"name": "classifier_4_blocks_avgpool_True_lr_0_01000", "accuracy": 0.8551999926567078}}
nemonameless commented 1 year ago

Here we provide a paddlepaddle dinov2 for finetuning https://github.com/PaddlePaddle/PASSL/tree/main/tasks/ssl/dinov2

XiaohuJoshua commented 1 year ago

I found it seems to be the problem of total batch size, when I use one node with 8 gpus, I got the result of 86.27%. Before that, I just use one node with one gpu.

EddieAy commented 1 year ago

I found it seems to be the problem of total batch size, when I use one node with 8 gpus, I got the result of 86.27%. Before that, I just use one node with one gpu.

https://github.com/facebookresearch/dinov2/issues/134#issuecomment-1630981911 It seems that you can use one node with 8 gpus ?

XiaohuJoshua commented 1 year ago

I found it seems to be the problem of total batch size, when I use one node with 8 gpus, I got the result of 86.27%. Before that, I just use one node with one gpu.

#134 (comment) It seems that you can use one node with 8 gpus ?

My script is like:

python -m torch.distributed.launch --nproc_per_node=8 --use_env dinov2/eval/linear.py ......

I guess the linear evaluation is fine as long as it is running with 8 GPUs.