facebookresearch / LaViLa

Code release for "Learning Video Representations from Large Language Models"

MIT License

491 stars 46 forks source link

Train LAVILA (L) to perform action recognition on the EPIC-100 dataset? #9

Open daiguangzhao opened 1 year ago

daiguangzhao commented 1 year ago

Thank you and your team for bringing such great work. I currently have only one node (8 cards in total), how should I fine-tune the model on the epic-100 dataset? Is the correct script like the one below?

TimeSformer-Large

python run_with_submitit_finetune_classification.py \ --pretrain-model $PATH \ --use-vn-classifier --num-classes 97 300 3806 \ --use-sgd --wd 4e-5 --lr-multiplier-on-backbone 0.1 \ --use-checkpoint --node 1

zhaoyue-zephyrus commented 1 year ago

Hi @daiguangzhao ,

The command you attached should work. To be more close to our setting, you may also try to either (1) add --update-freq 4 OR (2) linearly scale your learning rate by 1/4x, namely --lr 7.5e-4 if you are using --node 1. Note that if your machine is not scheduled by slurm, you can simply use torchrun nproc_per_node=8 main_finetune_classification.py ... to kick off your job.