RuntimeError: The size of tensor a (162) must match the size of tensor b (7) at non-singleton dimension 1

nrikoh commented 8 months ago

When I run run_llava.py, I get the following error: Traceback (most recent call last): File "/home/MMMU-main/eval/run_llava.py", line 106, in main() File "/home/MMMU-main/eval/run_llava.py", line 98, in main out_samples = run_model(args, samples, model, call_model_engine, tokenizer, processor) File "/home/MMMU-main/eval/run_llava.py", line 23, in run_model response = call_model_engine_fn(args, sample, model, tokenizer, processor) File "/home/MMMU-main/eval/utils/model_utils.py", line 57, in call_llava_engine_df n_diff_input_output = (input_ids != output_ids[:, :input_token_len]).sum().item() RuntimeError: The size of tensor a (162) must match the size of tensor b (7) at non-singleton dimension 1 What causes this?

drogozhang commented 8 months ago

Hi, I need more context

nrikoh commented 8 months ago

Hi, I need more context

After I completed pretrain and instruct finetune using the official code of llava1.5 https://github.com/haotian-liu/LLaVA, I followed https://github.com/MMMU-Benchmark/MMMU/blob/main/eval/run_llava.py to evaluate the performance of llava1.5. Download the MMMU on huggingface to the local via git clone git@hf.co:datasets/MMMU/MMMU. The following is the parameter configuration of run_llava.py:

     parser = ArgumentParser()
     parser.add_argument('--output_path', type=str, default='llava1.5_7b_val.json',
                         help='name of saved json')
     parser.add_argument('--config_path', type=str, default="configs/llava1.5.yaml")
     parser.add_argument('--data_path', type=str, default="/root/MMMU/MMMU") # hf dataset path.
     parser.add_argument('--model_path', type=str, default="/root/llava1.5_finetune/checkpoint/llava-v1.5-7b/")
     parser.add_argument('--split', type=str, default='validation')
     parser.add_argument('--seed', type=int, default=42)

The /root/llava1.5_finetune/checkpoint/llava-v1.5-7b/ path I specified includes:

The situation at runtime is as follows:

Even when I downloaded huggingface’s official llava1.5 file (https://huggingface.co/liuhaotian/llava-v1.5-7b/tree/main) locally and then ran run_llava.py, I also got the same error RuntimeError: The size of tensor a (162) must match the size of tensor b (7) at non-singleton dimension 1

drogozhang commented 8 months ago

Hi, I never met this situation before, so may be you can try to print the shape of input_ids, output_ids, input_token_len?

nrikoh commented 8 months ago

Hi, I never met this situation before, so may be you can try to print the shape of input_ids, output_ids, input_token_len?

I have now tried the results of llava13b again. The model weights used are from huggingface . An error will also be reported here that input_ids and output_ids do not match. When I commented out this line of code, the code could perform inference normally. After running the 900 verification set, I used main_eval_only.py for evaluation, but found that the results obtained by the previous llava7b and the current llava13b were the same, both in "Overall" "Achieved a result of 26.7%. There are two questions I don't understand now:

Why do I download the weights from huggingface, but when I perform MMMU evaluation on the recurrence indicators, I get an error that the dimensions are inconsistent?
Why are the evaluation results of llava7b and llava13b actually the same?

nrikoh commented 8 months ago

The steps I take for my assessment are:

Install llava, refer https://github.com/haotian-liu/LLaVA
Download the weights on huggingface, refer to https://huggingface.co/liuhaotian/llava-v1.5-13b/tree/main
Download the MMMU-main code, refer to https://github.com/MMMU-Benchmark/MMMU
Download the MMMU data set by the command git clone git@hf.co:datasets/MMMU/MMMU

Change the parameters in MMMU-main/eval/run_llava.py as follows:

parser = ArgumentParser()
parser.add_argument('--output_path', type=str, default='llava1.5_13b_val.json',
                    help='name of saved json')
parser.add_argument('--config_path', type=str, default="configs/llava1.5.yaml")
parser.add_argument('--data_path', type=str, default="/root/MMMU/MMMU") # hf dataset path.
parser.add_argument('--model_path', type=str, default="/root/llava1.5_finetune/checkpoint/huggingface/llava-v1.5-13b/")
parser.add_argument('--split', type=str, default='validation')
parser.add_argument('--seed', type=int, default=42)

In addition, in order to get the final indicator, I commented out this part of the code in utils/model_utils.py, in order to avoid reporting inconsistent dimensions. Then, I ran python run_llava.py and got llava1.5_13b_val.json. Then I changed the parameters in main_eval_only.py,

parser.add_argument('--output_path', type=str, default="/home/MMMU-main/eval/llava1.5_13b_val_hf.json", help="The path to model output file.")
parser.add_argument('--answer_path', type=str, default="./answer_dict_val.json", help="Answer file path.")

ran python main_eval_only.py, and got the evaluation results as shown below. Is there any step I missed that caused the result to go wrong?

drogozhang commented 8 months ago

My guess is the llava code update causes the dimension mismatch issue and simply commenting that out would not solve that issue. I will delve deep into it when I have more time.

youngwanLEE commented 8 months ago

@drogozhang @nrikoh The same problem occurs for me.

laaambs commented 7 months ago

@nrikoh @youngwanLEE @drogozhang I encountered the same issue. Do you have any solutions?

Isaachhh commented 7 months ago

transformers commit@633215ba58fe5114d8c8d32e415a04600e010701 modeling_llama

You should add "Default to old behavior: keep only FInal ID" back

yu-changqian commented 6 months ago

I have the same issue. The final results are wrong. Any update?

yu-changqian commented 6 months ago

transformers commit@633215ba58fe5114d8c8d32e415a04600e010701 modeling_llama

You should add "Default to old behavior: keep only FInal ID" back

Could you give more details about solving this problem? Thanks.

Isaachhh commented 6 months ago

Could you give more details about solving this problem? Thanks.

Find the source code of transformers in your enviroment (may be anaconda3/envs/xxxx/lib/python3.10/site-packages/transformers/models/llama), and then edit the modeling_llama.py by adding the branch of "Default to old behavior: keep only FInal ID" back.

Like this:

kyleliang919 commented 6 months ago

@Isaachhh Sorry I don't fully understand the implication of this change. Is it just removing the input context all together? The error above feels like a mismatch because the image tokens are not accounted for in n_diff_input_output = (input_ids !=output_ids[:, :input_token_len:], skip_special_tokens=True)[0] By adding the else back above, it looks like the text context will also be ignored during generation?

Isaachhh commented 6 months ago

@kyleliang919 It's because past_length includes image tokens produced by CLIP but input_ids only includes a -200 as the image placeholder. Namely, past_length > input_ids.shape[1] and mismatch occurs.

It's because a hack-in implemention of original LLaVA

drogozhang commented 6 months ago

Thank you so much @Isaachhh for the help!

Close this issue for now. Feel free to re-open it if necessary.

drogozhang commented 5 months ago

Hi, Sry for being busy previously, here is the simpler solution I found:

when installing llava following the instructions, go back to tags/v1.1.3 version before installation.

i.e., nothing changed in step 1:

git clone https://github.com/haotian-liu/LLaVA.git
cd LLaVA

In Step 2:

conda create -n llava python=3.10 -y
conda activate llava
pip install --upgrade pip  # enable PEP 660 support
git fetch --tags  # newly added.
git checkout tags/v1.1.3  # newly added.
pip install -e .

This will install older llava (1.5 only) and corresponding correct transformers version.

Hope it helps!

drogozhang commented 5 months ago

Have just updated the readme as well. let me know if you have any other questions.

MMMU-Benchmark / MMMU

RuntimeError: The size of tensor a (162) must match the size of tensor b (7) at non-singleton dimension 1 #18