Fail to finetune from the provided pretrained model checkpoint on UCF101

Yisen-Feng commented 1 year ago

Hi! I have tried to reproduce the result on UCF101.I succeeded in testing the finetuned checkpoint but failed to finetune the pretrained checkpoint.I am using this script and this checkpoint.Is there something wrong with my setting?@yztongzhan This is my finetune_log The dataset is divided following the trainlist01 and the testlist01 realeased in UCF101.And the val is the same as the test dataset. the following is my script

OUTPUT_DIR='./ucf101/finetune1' DATA_PATH='../data/ucf101/UCF-101' MODEL_PATH='../model/videomae/ucf101/finetune_checkpoint.pth'

OMP_NUM_THREADS=1 python3 -m torch.distributed.launch --nproc_per_node=1 \ --master_port 12320 run_class_finetuning.py \ --model vit_base_patch16_224 \ --data_path ${DATA_PATH} \ --finetune ${MODEL_PATH} \ --log_dir ${OUTPUT_DIR} \ --output_dir ${OUTPUT_DIR} \ --data_set UCF101 \ --nb_classes 101 \ --batch_size 16 \ --input_size 224 \ --short_side_size 224 \ --save_ckpt_freq 50 \ --num_frames 16 \ --sampling_rate 4 \ --num_sample 2 \ --opt adamw \ --lr 5e-4 \ --warmup_lr 1e-8 \ --min_lr 1e-5 \ --layer_decay 0.7 \ --opt_betas 0.9 0.999 \ --weight_decay 0.05 \ --epochs 100 \ --test_num_segment 5 \ --test_num_crop 3 \ --fc_drop_rate 0.5 \ --drop_path 0.2 \ --use_checkpoint \ --dist_eval \ --enable_deepspeed

yztongzhan commented 1 year ago

I recommend increasing the batch size as we default to --nproc_per_node=8.

Yisen-Feng commented 1 year ago

我建议增加批量大小，因为我们默认为--nproc_per_node=8.

thanks for your reply

wjj-w commented 1 year ago

Hi! I have tried to reproduce the result on UCF101.I succeeded in testing the finetuned checkpoint but failed to finetune the pretrained checkpoint.I am using this script and this checkpoint.Is there something wrong with my setting?@yztongzhan This is my finetune_log The dataset is divided following the trainlist01 and the testlist01 realeased in UCF101.And the val is the same as the test dataset. the following is my script

OUTPUT_DIR='./ucf101/finetune1' DATA_PATH='../data/ucf101/UCF-101' MODEL_PATH='../model/videomae/ucf101/finetune_checkpoint.pth'

OMP_NUM_THREADS=1 python3 -m torch.distributed.launch --nproc_per_node=1 --master_port 12320 run_class_finetuning.py --model vit_base_patch16_224 --data_path ${DATA_PATH} --finetune ${MODEL_PATH} --log_dir ${OUTPUT_DIR} --output_dir ${OUTPUT_DIR} --data_set UCF101 --nb_classes 101 --batch_size 16 --input_size 224 --short_side_size 224 --save_ckpt_freq 50 --num_frames 16 --sampling_rate 4 --num_sample 2 --opt adamw --lr 5e-4 --warmup_lr 1e-8 --min_lr 1e-5 --layer_decay 0.7 --opt_betas 0.9 0.999 --weight_decay 0.05 --epochs 100 --test_num_segment 5 --test_num_crop 3 --fc_drop_rate 0.5 --drop_path 0.2 --use_checkpoint --dist_eval --enable_deepspeed

Hello, can you share the csv file of ucf101? I'm having some problems reading the video. Looking forward to your reply.

Yisen-Feng commented 1 year ago

Hi! I have tried to reproduce the result on UCF101.I succeeded in testing the finetuned checkpoint but failed to finetune the pretrained checkpoint.I am using this script and this checkpoint.Is there something wrong with my setting?@yztongzhan This is my finetune_log The dataset is divided following the trainlist01 and the testlist01 realeased in UCF101.And the val is the same as the test dataset. the following is my script OUTPUT_DIR='./ucf101/finetune1' DATA_PATH='../data/ucf101/UCF-101' MODEL_PATH='../model/videomae/ucf101/finetune_checkpoint.pth' OMP_NUM_THREADS=1 python3 -m torch.distributed.launch --nproc_per_node=1 --master_port 12320 run_class_finetuning.py --model vit_base_patch16_224 --data_path ${DATA_PATH} --finetune ${MODEL_PATH} --log_dir ${OUTPUT_DIR} --output_dir ${OUTPUT_DIR} --data_set UCF101 --nb_classes 101 --batch_size 16 --input_size 224 --short_side_size 224 --save_ckpt_freq 50 --num_frames 16 --sampling_rate 4 --num_sample 2 --opt adamw --lr 5e-4 --warmup_lr 1e-8 --min_lr 1e-5 --layer_decay 0.7 --opt_betas 0.9 0.999 --weight_decay 0.05 --epochs 100 --test_num_segment 5 --test_num_crop 3 --fc_drop_rate 0.5 --drop_path 0.2 --use_checkpoint --dist_eval --enable_deepspeed

Hello, can you share the csv file of ucf101? I'm having some problems reading the video. Looking forward to your reply.

recommend referring to Data Preparation to make csv.Mine cannot use directly test.csv train.csv

wjj-w commented 1 year ago

train.csv

OK，Thanks !

MCG-NJU / VideoMAE

Fail to finetune from the provided pretrained model checkpoint on UCF101 #92