PKU-YuanGroup / LanguageBind

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
https://arxiv.org/abs/2310.01852
MIT License
549 stars 44 forks source link

cannot run the code train #13

Closed pphuc25 closed 6 months ago

pphuc25 commented 6 months ago

When run the code train, I use the sample TextVideo with the data is MSRVTT, to implement, run the config

CACHE_DIR= '/root/.cache'
TRAIN_DATA = '/content/MSRVTT_data.json'
# this script is for 640 total batch_size (n(16) GPUs * batch_size(10) * accum_freq(4))
%cd /content/LanguageBind
TORCH_DISTRIBUTED_DEBUG=DETAIL HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 torchrun --nnodes=1 --node_rank=0 --nproc_per_node 1 \
    -m main  \
    --train-data ${TRAIN_DATA} \
    --train-num-samples 1000 \
    --clip-type "vl" \
    --do_train \
    --lock-text --lock-image --text-type "mplug" \
    --init-temp 0.07 --learn-temp \
    --model "ViT-L-14" --cache-dir ${CACHE_DIR} \
    --convert_to_lora --lora_r 16 \
    --lr 1e-4 --coef-lr 1 \
    --beta1 0.9 --beta2 0.98 --wd 0.2 --eps 1e-6 \
    --num-frames 8 --force-patch-dropout 0.3 \
    --epochs 16 --batch-size 10 --accum-freq 4 --warmup 20 \
    --precision "amp" --workers 10 --video-decode-backend "imgs" \
    --save-frequency 1 --log-every-n-steps 20 --report-to "tensorboard" --resume "latest" \
    --do_eval \
    --val_vl_ret_data "msrvtt"

However, when run, the bug look like

LocalEntryNotFoundError: Cannot find the requested files in the disk cache and outgoing traffic has 
been disabled. To enable hf.co look-ups and downloads online, set 'local_files_only' to False.

How I can fix it?

LinB203 commented 6 months ago

Sorry for delay. Put the pretrained weight on path/to/LanguageBind. You can download from BaiDu disk, Peking univercity disk or Google disk We updated the instruction of training here

Btw. We already have updated the code. You'd better pull the code again.