Closed BeachWang closed 8 months ago
Hi, thanks for your interest. You can add --model-base microsoft/phi-2 \
to the following code in pope.sh
. Then it might work and start to run the scripts.
https://github.com/MILVLG/imp/blob/48bdc607089385e017f7b5c1f16762625a39eaf1/scripts/eval/pope.sh#L22-L29
And we will fix this problem as soon as possible. Let me know if there's any other questions.
Hi, I have added the --model-base
. But there is another problem.
Hi @BeachWang ! This bug is also caused by the update of phi-2 repo itself. More specifically, the phi-2 team has changed the parameter names in model weights, i.e., the keys in .safetensors
files. Thus there is a severe mismatch between current model weights of phi-2 and our codebase, which leading this loading error.
Our code cannot load "new version" of phi-2 unless we change our code to the new model definition of phi-2. But in that case, old checkpoints will also be unavailable. So we decide to maintain our code and recommend users to download old version of phi-2 repo.
Specifically, you can run following python script to download phi-2 to a local folder:
import os
# os.environ["https_proxy"] = "http://xxx.xxx.xxx.xxx:xx" # in case you need proxy to access Huggingface Hub
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="microsoft/phi-2",
revision="d3186761bf5c4409f7679359284066c25ab668ee",
local_dir='checkpoints/base/phi-2',
local_dir_use_symlinks=False
)
And use checkpoints/base/phi-2
as base model argument for future experiments.
Hi @ParadoxZW . Do we need to roll back the code for #7 if we use the old version of phi-2, since the codes for training have been made compatible to the new version of phi-2.
No need to roll back. The code should always work for old version of phi-2 for both training and evaluation.
See our latest update :) @BeachWang
Hi,
I have succeed in reproducing your work. But I find there are some problems in the eval of pope and sqa. Specifically, it should be $EVAL_CKPT
but not $CKPT
in line 27 and 47 in pope.sh
. As for sqa, I got the following error when running sqa.sh
.
The reason seems be that imp use model_vqa_loader
to eval sqa, but LLaVA use model_vqa_science
actually.
Hi, thanks for your reminder. We will fix these bugs in next update.
In scienceQA, you should use model_vqa_science
for llava_test_CQM-A.json
, and we rewriting a question file scienceqa_multi.jsonl which follows multiple-choice's prompt in LLaVA's Evaluation.md and fits to model_vqa_loader
. You can see the detail in next update too.
Hi,
I have trained imp with lora. However, it does not process the reference when I run the evaluation scripts. Following is the output when I eval pope.