Thedatababbler commented 9 months ago

Hi, I used the evaluate.py file in the pipeline to evaluate my models. To make sure this evaluation can run on my single GPU node, I made some minor changes on the code for initialization of multi-gpu environments and remained other thing the same. The modified shell script is like the following: `

!/bin/bash

export CUDA_VISIBLE_DEVICES="0" export MASTER_ADDR="localhost" export MASTER_PORT="29501" export WORLD_SIZE=4 export RANK=0

cd /path/to/Otter realpath . python -m pipeline.eval.evaluate \ --model=otter \ --results_file="OTTER_mpt1b_origin.json" \ --model_path="luodian/OTTER-MPT1B-RPJama-Init" \

--precision="bf16" \
--batch_size=1 \
--eval_coco \
--device="cuda" \
--coco_train_image_dir_path "/path/to/images/train2014" \
--coco_val_image_dir_path "/path/to/coco/images/val2014" \
--coco_karpathy_json_path "/path/to/dataset_coco.json" \
--coco_annotations_json_path "/path/to/captions_val2014.json" \

` Above shell script is used for running evaluation on COCO dataset with the pre-trained Otter 1b model. However, this evaluation result returns a 0 CIDEr store for all few-shots tests.

Magically, after I include the below argument in the run script, the evaluation returns a normal number for all tests. `--checkpoint_path="path/to/checkpoint/OTTER-MPT1B-RPJama-Init/final_weights.pt' Where the pt file here is a model fine-tuned by myself. It seems like the model didn't properly loaded the pre-trained weights? That's why when my personalized ckpt file was loaded, it can return the results.

Could you help loacte the problem, which part of the code could possibly be blamed of this bug? Thank you!

Luodian commented 9 months ago

hi the MPT1B-init is only for init to train Otter-MPT-1B model. It’s not being trained and directly migrated from OpenFlamingo-1B, but with added special tokens. So evaluating this weight may not be suitable. could you try MPT7B version?

Best Regards, Bo On 1 Nov 2023 at 11:32 +0800, Thedatababbler @.***>, wrote:

Hi, I used the evaluate.py file in the pipeline to evaluate my models. To make sure this evaluation can run on my single GPU node, I made some minor changes on the code for initialization of multi-gpu environments and remained other thing the same. The modified shell script is like the following: `

!/bin/bash

export CUDA_VISIBLE_DEVICES="0" export MASTER_ADDR="localhost" export MASTER_PORT="29501" export WORLD_SIZE=4 export RANK=0 cd /path/to/Otter realpath . python -m pipeline.eval.evaluate --model=otter --results_file="OTTER_mpt1b_origin.json" --model_path="luodian/OTTER-MPT1B-RPJama-Init" \ --precision="bf16" \ --batch_size=1 \ --eval_coco \ --device="cuda" \ --coco_train_image_dir_path "/path/to/images/train2014" \ --coco_val_image_dir_path "/path/to/coco/images/val2014" \ --coco_karpathy_json_path "/path/to/dataset_coco.json" \ --coco_annotations_json_path "/path/to/captions_val2014.json" \ Above shell script is used for running evaluation on COCO dataset with the pre-trained Otter 1b model. However, this evaluation result returns a 0 CIDEr store for all few-shots tests. Magically, after I include the below argument in the run script, the evaluation returns a normal number for all tests. --checkpoint_path="path/to/checkpoint/OTTER-MPT1B-RPJama-Init/final_weights.pt' Where the pt file here is a model fine-tuned by myself. It seems like the model didn't properly loaded the pre-trained weights? That's why when my personalized ckpt file was loaded, it can return the results. Could you help loacte the problem, which part of the code could possibly be blamed of this bug? Thank you! — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

Thedatababbler commented 9 months ago

I tried to use "luodian/OTTER-Image-MPT7B" to replace the "luodian/OTTER_MPT1B_RPJama-Init" for the --model_path argument and the evaluation again. The CIDEr score is still 0.0 for all shots. It's really weird. What did I do wrong?

Luodian commented 9 months ago

@pufanyi Could Fanyi take a look at this issue, I thought we did the OTTER-MPT7B evaluation and report good numbers on COCO.

Luodian / Otter

[Evaluation]Evaluation with Otter pre-trained model on COCO dataset return 0 CIDER score #287

!/bin/bash

!/bin/bash