Open daiguangzhao opened 1 year ago
Hi @daiguangzhao ,
The command you attached should work. To be more close to our setting, you may also try to either (1) add --update-freq 4
OR (2) linearly scale your learning rate by 1/4x
, namely --lr 7.5e-4
if you are using --node 1
. Note that if your machine is not scheduled by slurm, you can simply use torchrun nproc_per_node=8 main_finetune_classification.py ...
to kick off your job.
Thank you and your team for bringing such great work. I currently have only one node (8 cards in total), how should I fine-tune the model on the epic-100 dataset? Is the correct script like the one below?
TimeSformer-Large
python run_with_submitit_finetune_classification.py \ --pretrain-model $PATH \ --use-vn-classifier --num-classes 97 300 3806 \ --use-sgd --wd 4e-5 --lr-multiplier-on-backbone 0.1 \ --use-checkpoint --node 1