Hi! I finetuned model on my dataset, but I'd like to resume from saved checkpoint .pt. But if I start the finetuning it always begins from 0 epoch.
My finetune.sh:
Set the path to save checkpoints
OUTPUT_DIR='/home/jovyan/people/Murtazin/VideoMAE/output_ckpts/eval_lr_1e-3_epoch_55'
# path to Kinetics set (train.csv/val.csv/test.csv)
DATA_PATH='/home/jovyan/datasets/sign_language/WLASL/WLASL_kinetic_hardcode'
# path to pretrain model
MODEL_PATH='/home/jovyan/people/Murtazin/VideoMAE/ckpts/checkpoint.pth'
PT_PATH='/home/jovyan/people/Murtazin/VideoMAE/output_ckpts/eval_lr_1e-3_epoch_100/checkpoint-45/mp_rank_00_model_states.pt'
# batch_size can be adjusted according to number of GPUs
# this script is for 64 GPUs (8 nodes x 8 GPUs)
OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=8 \
run_class_finetuning.py \
--model vit_large_patch16_224 \
--data_set WLASL \
--nb_classes 2000 \
--data_path ${DATA_PATH} \
--resume ${PT_PATH} \
--log_dir ${OUTPUT_DIR} \
--output_dir ${OUTPUT_DIR} \
--batch_size 2 \
--num_sample 2 \
--input_size 224 \
--short_side_size 224 \
--save_ckpt_freq 10 \
--num_frames 32 \
--sampling_rate 2 \
--opt adamw \
--lr 2e-3 \
--opt_betas 0.9 0.999 \
--weight_decay 0.05 \
--epochs 55 \
--dist_eval \
--test_num_segment 5 \
--test_num_crop 3 \
--enable_deepspeed \
Hi! I finetuned model on my dataset, but I'd like to resume from saved checkpoint .pt. But if I start the finetuning it always begins from 0 epoch. My finetune.sh:
Set the path to save checkpoints