facebookresearch / LaViLa

Code release for "Learning Video Representations from Large Language Models"
MIT License
478 stars 42 forks source link

helper script to convert egovlp checkpoint #26

Closed zhaoyue-zephyrus closed 11 months ago

zhaoyue-zephyrus commented 11 months ago

Usage: First, run:

# Need to explicitly export PYTHONPATH=<EgoVLP_ROOT> to go through its parser
PYTHONPATH=<EgoVLP_ROOT> python scripts/convert_egovlp_ckpt.py \
    --input-ckpt <EGOVLP_PATH> \
    --output-ckpt egovlp_converted.pth

Next, attach args information to the checkpoint by running the pre-training script:

PYTHONPATH=. torchrun --nproc_per_node=8 main_pretrain.py \
    --clip-length 16 --clip-stride 4 \
    --model CLIP_HF_EGOVLP_DISTILBERT_BASE \
    --resume ./egovlp_converted.pth \
    --output-dir <EXP_DIR>

Note that you don't need to run the entire pre-training. Simply modify the main_pretrain.py by adding dist_utils.save_on_master() at the very beginning of the training loop and exiting early. The saved checkpoint will have the same format as the exported one at MODEL_ZOO.md#zero-shot. You can use it for zero-shot evaluation (compatible with eval_zeroshot.py).