However, I noticed the checkpoints produced have the incorrect "hidden_size": 4096. Manually correcting it in config.json solves the problem and I could reproduce similar numbers to those reported in the paper, however I wonder if you have an idea of what might cause it
I re-trained the model as per the README running:
torchrun --nproc_per_node=8 --master_port=22447 --max_restarts=0 train.py \ --model_name microsoft/Phi-3.5-vision-instruct --bf16 --pooling last \ --dataset_name TIGER-Lab/MMEB-train \ --subset_name A-OKVQA CIRR DocVQA ImageNet-A ImageNet_1K MSCOCO MSCOCO_t2i OK-VQA VisDial Visual7W-pointing CIFAR_100 ChartQA FashionIQ ImageNet-R InfographicsVQA MSCOCO_i2t NIGHTS VOC2007 Visual7W WebQA\ --num_sample_per_subset 50000 \ --image_dir MMEB-train \ --max_len 256 --num_crops 16 --output_dir outputs_bs_64_c_16 --logging_steps 10 \ --lr_scheduler_type linear --learning_rate 2e-5 --max_steps 2000 \ --warmup_steps 200 --save_steps 1000 --normalize True \ --temperature 0.02 --per_device_train_batch_size 8 \ --grad_cache True --gc_q_chunk_size 1 --gc_p_chunk_size 1
However, I noticed the checkpoints produced have the incorrect
"hidden_size": 4096
. Manually correcting it inconfig.json
solves the problem and I could reproduce similar numbers to those reported in the paper, however I wonder if you have an idea of what might cause it