JUNJIE99 / VISTA_Evaluation_FineTuning

Evaluation code and datasets for the ACL 2024 paper, VISTA: Visualized Text Embedding for Universal Multi-Modal Retrieval. The original code and model can be accessed at FlagEmbedding.
https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/visual
10 stars 2 forks source link

question about stage 2 bash file #3

Open olccihyeon opened 1 week ago

olccihyeon commented 1 week ago

I have a couple of questions.

  1. on the paper you say you used 1920 batchsize when experimenting with stage 2, can you tell me how many gpu nodes you actually used and how many gpu per node?

  2. looking at the bash file

!/bin/bash

env

gpus_per_node=8

Change for multinode config

MASTER_ADDR= # your machine address MASTER_PORT=661 NNODES=1 NODE_RANK=0 world_size=$(($gpus_per_node*$nnodes))

DATA_PATH= # your training data SAVE_PATH= # your saving path IMAGE_PATH= # your image path
EPOCH=5 RESUME_PATH= # the checkpoint path for initializing model SAVE_STEPS=100 GROUP_SIZE=4 # = one (positive sample) + number (of hard negative samples) BSZ_PERGPU=80 # batch size per gpu LR=2e-5

Training_Dir= #your training dir cd $Training_Dir

Data and model

mkdir $SAVE_PATH DISTRIBUTED_ARGS="--nproc_per_node $GPUS_PER_NODE --nnodes $NNODES --node_rank $NODE_RANK --master_addr $MASTER_ADDR --master_port $MASTER_PORT”

export LAUNCHER="torchrun \ distributed_args \ ”

full_options=” --output_dir $SAVE_PATH \ --bge_model_name_path \ --bge_model_name_or_path BAAI/bge-base-en-v1.5 \ --visual_model_name_or_path EVA02-CLIP-B-16 \ --dataloader_num_workers 1 \ --train_data $DATA_PATH \ --train_data_image $IMAGE_PATH \ --train_group_size $GROUP_SIZE --learning_rate $LR \ --fp16 \ --per_device_train_batch_size $BSZ_PERGPU \ --dataloader_drop_last True \ --normlized True \ --temperature 0.02 \ --logging_steps 10 \ --num_train_epochs $EPOCH \ --negatives_cross_device \ --train_text_tower False \ --train_vision_tower True \ --resume_path $RESUME_PATH \ --save_steps $SAVE_STEPS \ --deepspeed ./EVA-CLIP/rei/training/deepspeed_config.json \ --gradient_checkpointing \ ”

run_cmd="$LAUNCHER -m finetune.run_stage2_fusion ${full_options}” echo ${run_cmd} eval ${run_cmd} 2>&1 | tee $SAVEPATH/output$NODE_RANK.log

set +x

What did we actually use deepspeed and gradient_checkpointing for here?

The config of deepspeed published in the Flagembedding repo and the batch size part of the corresponding bash file are slightly different, and the

gradient_checkpointing, you get an error that the parameter is updated twice.

Could you please give me the exact bash file you actually used for that code?

Thank you for your time.

Translated with DeepL.com (free version)

olccihyeon commented 1 week ago

Also, did you also use --multi_task_fusion True in your actual experiment?

JUNJIE99 commented 1 week ago
  1. Three nodes, each with 8 GPUs.
  2. We did not include the DeepSpeed config file in the flagembedding release of VISTA. All parameters are those found in the paper and the bash script. DeepSpeed is used to save training memory, and you can configure it according to your needs. This is the exact bash script used in our experiment, and we simply replaced our local paths.
  3. We use multi-task training in the second stage.
olccihyeon commented 1 week ago

Did the impact of ALL BATCH SIZE affect the results?

I'm using 4 gpu's on 1 a100 node, 80 batchsize each, and I'm getting less performance than paper.

Thank you for your answer

JUNJIE99 commented 1 week ago

Did you use the first-stage weights I provided for the second-stage training? Given that we employ cross-device for all samples, the total batch size in contrastive learning indeed has a significant impact on the model performance.

JUNJIE99 commented 1 week ago

In addition, we do not recommend training the second stage for too many steps. Excessive steps can easily lead to overfitting on the VISTA-S2 dataset, thereby affecting the performance of zero-shot evaluation.

olccihyeon commented 1 week ago

Thank you for your answer! Now I understand it properly, thank you