Open kobiso opened 2 years ago
@alexeib I can't reproduce the classifiaction accuracy (84.2%) reported in the paper using the following script
OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=8 run_class_finetuning.py \
--model beit_base_patch16_224 \
--finetune $CHECKPOINT \
--data_path ${DATA_PATH} --output_dir ${OUTPUT_DIR} --log_dir ${OUTPUT_DIR} --batch_size 128 --lr 4e-3 --update_freq 1 \
--warmup_epochs 10 --epochs 100 --layer_decay 0.65 --drop_path 0.2 --drop 0.0 \
--weight_decay 0.0 --mixup 0.8 --cutmix 1.0 --enable_deepspeed --nb_classes 1000 \
--target_layer -1 --world_size 8 --dist_url $dist_url
The final accuracy I got is 83.96%.
I noticed that you provided finetuned checkpoints. As a sanity check, can you provide the commands to run evaluation using your ImageNet fine-tuned models?
Update for my initial question.
I could run the run_class_finetuning.py
with --weight_decay 0.0
if I used --enable_deepspeed
.
However, the result was 83.894, which is lower than paper's score (84.2%) as @Alxead mentioned.
How should I reproduce the paper's score properly?
@arbabu123
❓ Questions and Help
Hello :) I found some different configs in data2vec_vision README and paper.
Q1: weight_decay
In data2vec_vision, the script to finetune the ViT-B model in README is as below:
However, if I set
weight_decay 0.0
, error occurs as below.If I set
weight_decay 0.05
as BeiT, I could run the experiment. Can you clarify it?Q2: warmup for finetuning ViT-B
In the paper, warmup for finetuning ViT-B is 20.
However, in the above script, it is set as
warmup_epochs 10
. Which one is correct?Q3: warmup for finetuning ViT-L
The script to finetuning the ViT-L model in README is as below:
What is $WARMUP in the script? Is it 5 as mentioned in the paper?
Thanks in advance!