Closed nrikoh closed 6 months ago
Hi, I need more context
Hi, I need more context
After I completed pretrain and instruct finetune using the official code of llava1.5 https://github.com/haotian-liu/LLaVA, I followed https://github.com/MMMU-Benchmark/MMMU/blob/main/eval/run_llava.py to evaluate the performance of llava1.5. Download the MMMU on huggingface to the local via git clone git@hf.co:datasets/MMMU/MMMU
. The following is the parameter configuration of run_llava.py:
parser = ArgumentParser()
parser.add_argument('--output_path', type=str, default='llava1.5_7b_val.json',
help='name of saved json')
parser.add_argument('--config_path', type=str, default="configs/llava1.5.yaml")
parser.add_argument('--data_path', type=str, default="/root/MMMU/MMMU") # hf dataset path.
parser.add_argument('--model_path', type=str, default="/root/llava1.5_finetune/checkpoint/llava-v1.5-7b/")
parser.add_argument('--split', type=str, default='validation')
parser.add_argument('--seed', type=int, default=42)
The /root/llava1.5_finetune/checkpoint/llava-v1.5-7b/ path I specified includes:
The situation at runtime is as follows:
Even when I downloaded huggingface’s official llava1.5 file (https://huggingface.co/liuhaotian/llava-v1.5-7b/tree/main) locally and then ran run_llava.py, I also got the same error RuntimeError: The size of tensor a (162) must match the size of tensor b (7) at non-singleton dimension 1
Hi, I never met this situation before, so may be you can try to print the shape of input_ids, output_ids, input_token_len?
Hi, I never met this situation before, so may be you can try to print the shape of input_ids, output_ids, input_token_len?
I have now tried the results of llava13b again. The model weights used are from huggingface . An error will also be reported here that input_ids and output_ids do not match. When I commented out this line of code, the code could perform inference normally. After running the 900 verification set, I used main_eval_only.py for evaluation, but found that the results obtained by the previous llava7b and the current llava13b were the same, both in "Overall" "Achieved a result of 26.7%. There are two questions I don't understand now:
The steps I take for my assessment are:
git clone git@hf.co:datasets/MMMU/MMMU
parser = ArgumentParser()
parser.add_argument('--output_path', type=str, default='llava1.5_13b_val.json',
help='name of saved json')
parser.add_argument('--config_path', type=str, default="configs/llava1.5.yaml")
parser.add_argument('--data_path', type=str, default="/root/MMMU/MMMU") # hf dataset path.
parser.add_argument('--model_path', type=str, default="/root/llava1.5_finetune/checkpoint/huggingface/llava-v1.5-13b/")
parser.add_argument('--split', type=str, default='validation')
parser.add_argument('--seed', type=int, default=42)
In addition, in order to get the final indicator, I commented out this part of the code in utils/model_utils.py, in order to avoid reporting inconsistent dimensions.
Then, I ran python run_llava.py
and got llava1.5_13b_val.json. Then I changed the parameters in main_eval_only.py,
parser.add_argument('--output_path', type=str, default="/home/MMMU-main/eval/llava1.5_13b_val_hf.json", help="The path to model output file.")
parser.add_argument('--answer_path', type=str, default="./answer_dict_val.json", help="Answer file path.")
ran python main_eval_only.py
, and got the evaluation results as shown below.
Is there any step I missed that caused the result to go wrong?
My guess is the llava code update causes the dimension mismatch issue and simply commenting that out would not solve that issue. I will delve deep into it when I have more time.
@drogozhang @nrikoh The same problem occurs for me.
@nrikoh @youngwanLEE @drogozhang I encountered the same issue. Do you have any solutions?
transformers commit@633215ba58fe5114d8c8d32e415a04600e010701 modeling_llama
You should add "Default to old behavior: keep only FInal ID" back
I have the same issue. The final results are wrong. Any update?
transformers commit@633215ba58fe5114d8c8d32e415a04600e010701 modeling_llama
You should add "Default to old behavior: keep only FInal ID" back
Could you give more details about solving this problem? Thanks.
Could you give more details about solving this problem? Thanks.
Find the source code of transformers in your enviroment (may be anaconda3/envs/xxxx/lib/python3.10/site-packages/transformers/models/llama
), and then edit the modeling_llama.py
by adding the branch of "Default to old behavior: keep only FInal ID" back.
Like this:
@Isaachhh Sorry I don't fully understand the implication of this change. Is it just removing the input context all together? The error above feels like a mismatch because the image tokens are not accounted for in
n_diff_input_output = (input_ids !=output_ids[:, :input_token_len:], skip_special_tokens=True)[0]
By adding the else back above, it looks like the text context will also be ignored during generation?
@kyleliang919
It's because past_length
includes image tokens produced by CLIP but input_ids
only includes a -200
as the image placeholder. Namely, past_length > input_ids.shape[1]
and mismatch occurs.
It's because a hack-in implemention of original LLaVA
Thank you so much @Isaachhh for the help!
Close this issue for now. Feel free to re-open it if necessary.
Hi, Sry for being busy previously, here is the simpler solution I found:
when installing llava following the instructions, go back to tags/v1.1.3
version before installation.
i.e., nothing changed in step 1:
git clone https://github.com/haotian-liu/LLaVA.git
cd LLaVA
In Step 2:
conda create -n llava python=3.10 -y
conda activate llava
pip install --upgrade pip # enable PEP 660 support
git fetch --tags # newly added.
git checkout tags/v1.1.3 # newly added.
pip install -e .
This will install older llava (1.5 only) and corresponding correct transformers version.
Hope it helps!
Have just updated the readme as well. let me know if you have any other questions.
When I run run_llava.py, I get the following error: Traceback (most recent call last): File "/home/MMMU-main/eval/run_llava.py", line 106, in
main()
File "/home/MMMU-main/eval/run_llava.py", line 98, in main
out_samples = run_model(args, samples, model, call_model_engine, tokenizer, processor)
File "/home/MMMU-main/eval/run_llava.py", line 23, in run_model
response = call_model_engine_fn(args, sample, model, tokenizer, processor)
File "/home/MMMU-main/eval/utils/model_utils.py", line 57, in call_llava_engine_df
n_diff_input_output = (input_ids != output_ids[:, :input_token_len]).sum().item()
RuntimeError: The size of tensor a (162) must match the size of tensor b (7) at non-singleton dimension 1
What causes this?