deepglint / unicom

MLCD & UNICOM : Large-Scale Visual Representation Model
https://huggingface.co/DeepGlint-AI/mlcd-vit-large-patch14-336
390 stars 20 forks source link

The expanded size of the tensor must match the existing size #69

Closed blackDZS closed 1 day ago

blackDZS commented 6 days ago

When I run ./eval.sh, it rise the error as below

export PYTHONPATH=$(pwd)
export HF_ENDPOINT=https://hf-mirror.com
export HF_TOKEN=hf_xxx

model_path="/data/tbsi/train_log/llavanext/20241119/checkpoints/llavanext-_data_tbsi_model_weights_clip-vit-large-patch14-_data_tbsi_model_weights_Qwen_Qwen2.5-7B-Instruct-mlp2x_gelu-pretrain_blip558k-finetune_llavanext780k"
conv_template='qwen_1_5'
run_port=12444
model_name='llavanext_qwen_2_5'

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m accelerate.commands.launch \
    --main_process_port=$run_port \
    --num_processes=8 \
    -m lmms_eval \
    --model llava \
    --model_args pretrained=$model_path,conv_template=$conv_template \
    --tasks mmbench,mme,mmmu,ocrbench,scienceqa,seedbench,gqa,realworldqa \
    --batch_size 1 \
    --log_samples \
    --log_samples_suffix $model_name \
    --output_path ./eval_log/ 
Traceback (most recent call last):
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/__main__.py", line 206, in cli_evaluate
    results, samples = cli_evaluate_single(args)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/__main__.py", line 301, in cli_evaluate_single
    results = evaluator.simple_evaluate(
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/utils.py", line 453, in _wrapper
    return fn(*args, **kwargs)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/evaluator.py", line 135, in simple_evaluate
    results = evaluate(
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/utils.py", line 453, in _wrapper
    return fn(*args, **kwargs)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/evaluator.py", line 297, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)  # Choiszt run generate until
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/models/llava.py", line 407, in generate_until
    raise e
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/models/llava.py", line 392, in generate_until
    cont = self.model.generate(
  File "/home/lanyun/miniconda3/envs/unicom/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/lanyun/project/train/unicom/llava/model/language_model/llava_qwen.py", line 131, in generate
    (inputs, position_ids, attention_mask, _, inputs_embeds, _) = self.prepare_inputs_labels_for_multimodal(inputs, position_ids, attention_mask, None, None, images, modalities, image_sizes=image_sizes)
  File "/home/lanyun/project/train/unicom/llava/model/llava_arch.py", line 397, in prepare_inputs_labels_for_multimodal
    image_feature = torch.cat((image_feature, self.model.image_newline[:, None, None].expand(*image_feature.shape[:-1], 1).to(image_feature.device)), dim=-1)
RuntimeError: The expanded size of the tensor (8960) must match the existing size (3584) at non-singleton dimension 0.  Target sizes: [8960, 34, 1].  Tensor sizes: [3584, 1, 1]
11-20 13:28:07 [lmms-eval/lmms_eval/__main__.py:220] ERROR Error during evaluation: The expanded size of the tensor (8960) must match the existing size (3584) at non-singleton dimension 0.  Target sizes: [8960, 34, 1].  Tensor sizes: [3584, 1, 1]
Traceback (most recent call last):
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/__main__.py", line 206, in cli_evaluate
    results, samples = cli_evaluate_single(args)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/__main__.py", line 301, in cli_evaluate_single
    results = evaluator.simple_evaluate(
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/utils.py", line 453, in _wrapper
    return fn(*args, **kwargs)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/evaluator.py", line 135, in simple_evaluate
    results = evaluate(
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/utils.py", line 453, in _wrapper
    return fn(*args, **kwargs)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/evaluator.py", line 297, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)  # Choiszt run generate until
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/models/llava.py", line 407, in generate_until
    raise e
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/models/llava.py", line 392, in generate_until
    cont = self.model.generate(
  File "/home/lanyun/miniconda3/envs/unicom/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/lanyun/project/train/unicom/llava/model/language_model/llava_qwen.py", line 131, in generate
    (inputs, position_ids, attention_mask, _, inputs_embeds, _) = self.prepare_inputs_labels_for_multimodal(inputs, position_ids, attention_mask, None, None, images, modalities, image_sizes=image_sizes)
  File "/home/lanyun/project/train/unicom/llava/model/llava_arch.py", line 397, in prepare_inputs_labels_for_multimodal
    image_feature = torch.cat((image_feature, self.model.image_newline[:, None, None].expand(*image_feature.shape[:-1], 1).to(image_feature.device)), dim=-1)
RuntimeError: The expanded size of the tensor (8960) must match the existing size (3584) at non-singleton dimension 0.  Target sizes: [8960, 34, 1].  Tensor sizes: [3584, 1, 1]
Model Responding:   0%|                                                                                             | 1/9304 [00:01<5:03:02,  1.95s/it]Traceback (most recent call last):
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/__main__.py", line 206, in cli_evaluate
    results, samples = cli_evaluate_single(args)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/__main__.py", line 301, in cli_evaluate_single
    results = evaluator.simple_evaluate(
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/utils.py", line 453, in _wrapper
    return fn(*args, **kwargs)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/evaluator.py", line 135, in simple_evaluate
    results = evaluate(
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/utils.py", line 453, in _wrapper
    return fn(*args, **kwargs)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/evaluator.py", line 297, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)  # Choiszt run generate until
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/models/llava.py", line 407, in generate_until
    raise e
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/models/llava.py", line 392, in generate_until
    cont = self.model.generate(
  File "/home/lanyun/miniconda3/envs/unicom/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/lanyun/project/train/unicom/llava/model/language_model/llava_qwen.py", line 131, in generate
    (inputs, position_ids, attention_mask, _, inputs_embeds, _) = self.prepare_inputs_labels_for_multimodal(inputs, position_ids, attention_mask, None, None, images, modalities, image_sizes=image_sizes)
  File "/home/lanyun/project/train/unicom/llava/model/llava_arch.py", line 397, in prepare_inputs_labels_for_multimodal
    image_feature = torch.cat((image_feature, self.model.image_newline[:, None, None].expand(*image_feature.shape[:-1], 1).to(image_feature.device)), dim=-1)
RuntimeError: The expanded size of the tensor (8960) must match the existing size (3584) at non-singleton dimension 0.  Target sizes: [8960, 34, 1].  Tensor sizes: [3584, 1, 1]
11-20 13:28:07 [lmms-eval/lmms_eval/__main__.py:220] ERROR Error during evaluation: The expanded size of the tensor (8960) must match the existing size (3584) at non-singleton dimension 0.  Target sizes: [8960, 34, 1].  Tensor sizes: [3584, 1, 1]
Traceback (most recent call last):
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/__main__.py", line 206, in cli_evaluate
    results, samples = cli_evaluate_single(args)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/__main__.py", line 301, in cli_evaluate_single
    results = evaluator.simple_evaluate(
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/utils.py", line 453, in _wrapper
    return fn(*args, **kwargs)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/evaluator.py", line 135, in simple_evaluate
    results = evaluate(
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/utils.py", line 453, in _wrapper
    return fn(*args, **kwargs)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/evaluator.py", line 297, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)  # Choiszt run generate until
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/models/llava.py", line 407, in generate_until
    raise e
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/models/llava.py", line 392, in generate_until
    cont = self.model.generate(
  File "/home/lanyun/miniconda3/envs/unicom/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/lanyun/project/train/unicom/llava/model/language_model/llava_qwen.py", line 131, in generate
    (inputs, position_ids, attention_mask, _, inputs_embeds, _) = self.prepare_inputs_labels_for_multimodal(inputs, position_ids, attention_mask, None, None, images, modalities, image_sizes=image_sizes)
  File "/home/lanyun/project/train/unicom/llava/model/llava_arch.py", line 397, in prepare_inputs_labels_for_multimodal
    image_feature = torch.cat((image_feature, self.model.image_newline[:, None, None].expand(*image_feature.shape[:-1], 1).to(image_feature.device)), dim=-1)
RuntimeError: The expanded size of the tensor (8960) must match the existing size (3584) at non-singleton dimension 0.  Target sizes: [8960, 34, 1].  Tensor sizes: [3584, 1, 1]
Model Responding:   0%|                                                                                             | 1/9304 [00:02<5:15:59,  2.04s/it]
Traceback (most recent call last):
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/__main__.py", line 206, in cli_evaluate
    results, samples = cli_evaluate_single(args)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/__main__.py", line 301, in cli_evaluate_single
    results = evaluator.simple_evaluate(
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/utils.py", line 453, in _wrapper
    return fn(*args, **kwargs)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/evaluator.py", line 135, in simple_evaluate
    results = evaluate(
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/utils.py", line 453, in _wrapper
    return fn(*args, **kwargs)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/evaluator.py", line 297, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)  # Choiszt run generate until
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/models/llava.py", line 407, in generate_until
    raise e
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/models/llava.py", line 392, in generate_until
    cont = self.model.generate(
  File "/home/lanyun/miniconda3/envs/unicom/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/lanyun/project/train/unicom/llava/model/language_model/llava_qwen.py", line 131, in generate
    (inputs, position_ids, attention_mask, _, inputs_embeds, _) = self.prepare_inputs_labels_for_multimodal(inputs, position_ids, attention_mask, None, None, images, modalities, image_sizes=image_sizes)
  File "/home/lanyun/project/train/unicom/llava/model/llava_arch.py", line 397, in prepare_inputs_labels_for_multimodal
    image_feature = torch.cat((image_feature, self.model.image_newline[:, None, None].expand(*image_feature.shape[:-1], 1).to(image_feature.device)), dim=-1)
RuntimeError: The expanded size of the tensor (8960) must match the existing size (3584) at non-singleton dimension 0.  Target sizes: [8960, 34, 1].  Tensor sizes: [3584, 1, 1]
11-20 13:28:09 [lmms-eval/lmms_eval/__main__.py:220] ERROR Error during evaluation: The expanded size of the tensor (8960) must match the existing size (3584) at non-singleton dimension 0.  Target sizes: [8960, 34, 1].  Tensor sizes: [3584, 1, 1]
Traceback (most recent call last):
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/__main__.py", line 206, in cli_evaluate
    results, samples = cli_evaluate_single(args)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/__main__.py", line 301, in cli_evaluate_single
    results = evaluator.simple_evaluate(
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/utils.py", line 453, in _wrapper
    return fn(*args, **kwargs)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/evaluator.py", line 135, in simple_evaluate
    results = evaluate(
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/utils.py", line 453, in _wrapper
    return fn(*args, **kwargs)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/evaluator.py", line 297, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)  # Choiszt run generate until
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/models/llava.py", line 407, in generate_until
    raise e
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/models/llava.py", line 392, in generate_until
    cont = self.model.generate(
  File "/home/lanyun/miniconda3/envs/unicom/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/lanyun/project/train/unicom/llava/model/language_model/llava_qwen.py", line 131, in generate
    (inputs, position_ids, attention_mask, _, inputs_embeds, _) = self.prepare_inputs_labels_for_multimodal(inputs, position_ids, attention_mask, None, None, images, modalities, image_sizes=image_sizes)
  File "/home/lanyun/project/train/unicom/llava/model/llava_arch.py", line 397, in prepare_inputs_labels_for_multimodal
    image_feature = torch.cat((image_feature, self.model.image_newline[:, None, None].expand(*image_feature.shape[:-1], 1).to(image_feature.device)), dim=-1)
RuntimeError: The expanded size of the tensor (8960) must match the existing size (3584) at non-singleton dimension 0.  Target sizes: [8960, 34, 1].  Tensor sizes: [3584, 1, 1]
Traceback (most recent call last):
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/__main__.py", line 206, in cli_evaluate
    results, samples = cli_evaluate_single(args)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/__main__.py", line 301, in cli_evaluate_single
    results = evaluator.simple_evaluate(
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/utils.py", line 453, in _wrapper
    return fn(*args, **kwargs)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/evaluator.py", line 135, in simple_evaluate
    results = evaluate(
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/utils.py", line 453, in _wrapper
    return fn(*args, **kwargs)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/evaluator.py", line 297, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)  # Choiszt run generate until
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/models/llava.py", line 407, in generate_until
    raise e
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/models/llava.py", line 392, in generate_until
    cont = self.model.generate(
  File "/home/lanyun/miniconda3/envs/unicom/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/lanyun/project/train/unicom/llava/model/language_model/llava_qwen.py", line 131, in generate
    (inputs, position_ids, attention_mask, _, inputs_embeds, _) = self.prepare_inputs_labels_for_multimodal(inputs, position_ids, attention_mask, None, None, images, modalities, image_sizes=image_sizes)
  File "/home/lanyun/project/train/unicom/llava/model/llava_arch.py", line 397, in prepare_inputs_labels_for_multimodal
    image_feature = torch.cat((image_feature, self.model.image_newline[:, None, None].expand(*image_feature.shape[:-1], 1).to(image_feature.device)), dim=-1)
RuntimeError: The expanded size of the tensor (8960) must match the existing size (3584) at non-singleton dimension 0.  Target sizes: [8960, 34, 1].  Tensor sizes: [3584, 1, 1]
11-20 13:28:10 [lmms-eval/lmms_eval/__main__.py:220] ERROR Error during evaluation: The expanded size of the tensor (8960) must match the existing size (3584) at non-singleton dimension 0.  Target sizes: [8960, 34, 1].  Tensor sizes: [3584, 1, 1]
Traceback (most recent call last):
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/__main__.py", line 206, in cli_evaluate
    results, samples = cli_evaluate_single(args)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/__main__.py", line 301, in cli_evaluate_single
    results = evaluator.simple_evaluate(
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/utils.py", line 453, in _wrapper
    return fn(*args, **kwargs)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/evaluator.py", line 135, in simple_evaluate
    results = evaluate(
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/utils.py", line 453, in _wrapper
    return fn(*args, **kwargs)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/evaluator.py", line 297, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)  # Choiszt run generate until
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/models/llava.py", line 407, in generate_until
    raise e
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/models/llava.py", line 392, in generate_until
    cont = self.model.generate(
  File "/home/lanyun/miniconda3/envs/unicom/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/lanyun/project/train/unicom/llava/model/language_model/llava_qwen.py", line 131, in generate
    (inputs, position_ids, attention_mask, _, inputs_embeds, _) = self.prepare_inputs_labels_for_multimodal(inputs, position_ids, attention_mask, None, None, images, modalities, image_sizes=image_sizes)
  File "/home/lanyun/project/train/unicom/llava/model/llava_arch.py", line 397, in prepare_inputs_labels_for_multimodal
    image_feature = torch.cat((image_feature, self.model.image_newline[:, None, None].expand(*image_feature.shape[:-1], 1).to(image_feature.device)), dim=-1)
RuntimeError: The expanded size of the tensor (8960) must match the existing size (3584) at non-singleton dimension 0.  Target sizes: [8960, 34, 1].  Tensor sizes: [3584, 1, 1]
Traceback (most recent call last):
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/__main__.py", line 206, in cli_evaluate
    results, samples = cli_evaluate_single(args)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/__main__.py", line 301, in cli_evaluate_single
    results = evaluator.simple_evaluate(
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/utils.py", line 453, in _wrapper
    return fn(*args, **kwargs)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/evaluator.py", line 135, in simple_evaluate
    results = evaluate(
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/utils.py", line 453, in _wrapper
    return fn(*args, **kwargs)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/evaluator.py", line 297, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)  # Choiszt run generate until
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/models/llava.py", line 407, in generate_until
    raise e
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/models/llava.py", line 392, in generate_until
    cont = self.model.generate(
  File "/home/lanyun/miniconda3/envs/unicom/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/lanyun/project/train/unicom/llava/model/language_model/llava_qwen.py", line 131, in generate
    (inputs, position_ids, attention_mask, _, inputs_embeds, _) = self.prepare_inputs_labels_for_multimodal(inputs, position_ids, attention_mask, None, None, images, modalities, image_sizes=image_sizes)
  File "/home/lanyun/project/train/unicom/llava/model/llava_arch.py", line 397, in prepare_inputs_labels_for_multimodal
    image_feature = torch.cat((image_feature, self.model.image_newline[:, None, None].expand(*image_feature.shape[:-1], 1).to(image_feature.device)), dim=-1)
RuntimeError: The expanded size of the tensor (8960) must match the existing size (3584) at non-singleton dimension 0.  Target sizes: [8960, 16, 1].  Tensor sizes: [3584, 1, 1]
11-20 13:28:11 [lmms-eval/lmms_eval/__main__.py:220] ERROR Error during evaluation: The expanded size of the tensor (8960) must match the existing size (3584) at non-singleton dimension 0.  Target sizes: [8960, 16, 1].  Tensor sizes: [3584, 1, 1]
Traceback (most recent call last):
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/__main__.py", line 206, in cli_evaluate
    results, samples = cli_evaluate_single(args)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/__main__.py", line 301, in cli_evaluate_single
    results = evaluator.simple_evaluate(
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/utils.py", line 453, in _wrapper
    return fn(*args, **kwargs)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/evaluator.py", line 135, in simple_evaluate
    results = evaluate(
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/utils.py", line 453, in _wrapper
    return fn(*args, **kwargs)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/evaluator.py", line 297, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)  # Choiszt run generate until
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/models/llava.py", line 407, in generate_until
    raise e
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/models/llava.py", line 392, in generate_until
    cont = self.model.generate(
  File "/home/lanyun/miniconda3/envs/unicom/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/lanyun/project/train/unicom/llava/model/language_model/llava_qwen.py", line 131, in generate
    (inputs, position_ids, attention_mask, _, inputs_embeds, _) = self.prepare_inputs_labels_for_multimodal(inputs, position_ids, attention_mask, None, None, images, modalities, image_sizes=image_sizes)
  File "/home/lanyun/project/train/unicom/llava/model/llava_arch.py", line 397, in prepare_inputs_labels_for_multimodal
    image_feature = torch.cat((image_feature, self.model.image_newline[:, None, None].expand(*image_feature.shape[:-1], 1).to(image_feature.device)), dim=-1)
RuntimeError: The expanded size of the tensor (8960) must match the existing size (3584) at non-singleton dimension 0.  Target sizes: [8960, 16, 1].  Tensor sizes: [3584, 1, 1]
Traceback (most recent call last):
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/__main__.py", line 206, in cli_evaluate
    results, samples = cli_evaluate_single(args)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/__main__.py", line 301, in cli_evaluate_single
    results = evaluator.simple_evaluate(
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/utils.py", line 453, in _wrapper
    return fn(*args, **kwargs)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/evaluator.py", line 135, in simple_evaluate
    results = evaluate(
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/utils.py", line 453, in _wrapper
    return fn(*args, **kwargs)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/evaluator.py", line 297, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)  # Choiszt run generate until
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/models/llava.py", line 407, in generate_until
    raise e
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/models/llava.py", line 392, in generate_until
    cont = self.model.generate(
  File "/home/lanyun/miniconda3/envs/unicom/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/lanyun/project/train/unicom/llava/model/language_model/llava_qwen.py", line 131, in generate
    (inputs, position_ids, attention_mask, _, inputs_embeds, _) = self.prepare_inputs_labels_for_multimodal(inputs, position_ids, attention_mask, None, None, images, modalities, image_sizes=image_sizes)
  File "/home/lanyun/project/train/unicom/llava/model/llava_arch.py", line 397, in prepare_inputs_labels_for_multimodal
    image_feature = torch.cat((image_feature, self.model.image_newline[:, None, None].expand(*image_feature.shape[:-1], 1).to(image_feature.device)), dim=-1)
RuntimeError: The expanded size of the tensor (8960) must match the existing size (3584) at non-singleton dimension 0.  Target sizes: [8960, 16, 1].  Tensor sizes: [3584, 1, 1]
11-20 13:28:12 [lmms-eval/lmms_eval/__main__.py:220] ERROR Error during evaluation: The expanded size of the tensor (8960) must match the existing size (3584) at non-singleton dimension 0.  Target sizes: [8960, 16, 1].  Tensor sizes: [3584, 1, 1]
Traceback (most recent call last):
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/__main__.py", line 206, in cli_evaluate
    results, samples = cli_evaluate_single(args)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/__main__.py", line 301, in cli_evaluate_single
    results = evaluator.simple_evaluate(
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/utils.py", line 453, in _wrapper
    return fn(*args, **kwargs)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/evaluator.py", line 135, in simple_evaluate
    results = evaluate(
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/utils.py", line 453, in _wrapper
    return fn(*args, **kwargs)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/evaluator.py", line 297, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)  # Choiszt run generate until
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/models/llava.py", line 407, in generate_until
    raise e
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/models/llava.py", line 392, in generate_until
    cont = self.model.generate(
  File "/home/lanyun/miniconda3/envs/unicom/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/lanyun/project/train/unicom/llava/model/language_model/llava_qwen.py", line 131, in generate
    (inputs, position_ids, attention_mask, _, inputs_embeds, _) = self.prepare_inputs_labels_for_multimodal(inputs, position_ids, attention_mask, None, None, images, modalities, image_sizes=image_sizes)
  File "/home/lanyun/project/train/unicom/llava/model/llava_arch.py", line 397, in prepare_inputs_labels_for_multimodal
    image_feature = torch.cat((image_feature, self.model.image_newline[:, None, None].expand(*image_feature.shape[:-1], 1).to(image_feature.device)), dim=-1)
RuntimeError: The expanded size of the tensor (8960) must match the existing size (3584) at non-singleton dimension 0.  Target sizes: [8960, 16, 1].  Tensor sizes: [3584, 1, 1]
Traceback (most recent call last):
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/__main__.py", line 206, in cli_evaluate
    results, samples = cli_evaluate_single(args)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/__main__.py", line 301, in cli_evaluate_single
    results = evaluator.simple_evaluate(
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/utils.py", line 453, in _wrapper
    return fn(*args, **kwargs)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/evaluator.py", line 135, in simple_evaluate
    results = evaluate(
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/utils.py", line 453, in _wrapper
    return fn(*args, **kwargs)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/evaluator.py", line 297, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)  # Choiszt run generate until
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/models/llava.py", line 407, in generate_until
    raise e
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/models/llava.py", line 392, in generate_until
    cont = self.model.generate(
  File "/home/lanyun/miniconda3/envs/unicom/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/lanyun/project/train/unicom/llava/model/language_model/llava_qwen.py", line 131, in generate
    (inputs, position_ids, attention_mask, _, inputs_embeds, _) = self.prepare_inputs_labels_for_multimodal(inputs, position_ids, attention_mask, None, None, images, modalities, image_sizes=image_sizes)
  File "/home/lanyun/project/train/unicom/llava/model/llava_arch.py", line 397, in prepare_inputs_labels_for_multimodal
    image_feature = torch.cat((image_feature, self.model.image_newline[:, None, None].expand(*image_feature.shape[:-1], 1).to(image_feature.device)), dim=-1)
RuntimeError: The expanded size of the tensor (8960) must match the existing size (3584) at non-singleton dimension 0.  Target sizes: [8960, 34, 1].  Tensor sizes: [3584, 1, 1]
11-20 13:28:14 [lmms-eval/lmms_eval/__main__.py:220] ERROR Error during evaluation: The expanded size of the tensor (8960) must match the existing size (3584) at non-singleton dimension 0.  Target sizes: [8960, 34, 1].  Tensor sizes: [3584, 1, 1]
Traceback (most recent call last):
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/__main__.py", line 206, in cli_evaluate
    results, samples = cli_evaluate_single(args)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/__main__.py", line 301, in cli_evaluate_single
    results = evaluator.simple_evaluate(
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/utils.py", line 453, in _wrapper
    return fn(*args, **kwargs)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/evaluator.py", line 135, in simple_evaluate
    results = evaluate(
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/utils.py", line 453, in _wrapper
    return fn(*args, **kwargs)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/evaluator.py", line 297, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)  # Choiszt run generate until
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/models/llava.py", line 407, in generate_until
    raise e
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/models/llava.py", line 392, in generate_until
    cont = self.model.generate(
  File "/home/lanyun/miniconda3/envs/unicom/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/lanyun/project/train/unicom/llava/model/language_model/llava_qwen.py", line 131, in generate
    (inputs, position_ids, attention_mask, _, inputs_embeds, _) = self.prepare_inputs_labels_for_multimodal(inputs, position_ids, attention_mask, None, None, images, modalities, image_sizes=image_sizes)
  File "/home/lanyun/project/train/unicom/llava/model/llava_arch.py", line 397, in prepare_inputs_labels_for_multimodal
    image_feature = torch.cat((image_feature, self.model.image_newline[:, None, None].expand(*image_feature.shape[:-1], 1).to(image_feature.device)), dim=-1)
RuntimeError: The expanded size of the tensor (8960) must match the existing size (3584) at non-singleton dimension 0.  Target sizes: [8960, 34, 1].  Tensor sizes: [3584, 1, 1]
Traceback (most recent call last):
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/__main__.py", line 206, in cli_evaluate
    results, samples = cli_evaluate_single(args)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/__main__.py", line 301, in cli_evaluate_single
    results = evaluator.simple_evaluate(
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/utils.py", line 453, in _wrapper
    return fn(*args, **kwargs)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/evaluator.py", line 135, in simple_evaluate
    results = evaluate(
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/utils.py", line 453, in _wrapper
    return fn(*args, **kwargs)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/evaluator.py", line 297, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)  # Choiszt run generate until
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/models/llava.py", line 407, in generate_until
    raise e
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/models/llava.py", line 392, in generate_until
    cont = self.model.generate(
  File "/home/lanyun/miniconda3/envs/unicom/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/lanyun/project/train/unicom/llava/model/language_model/llava_qwen.py", line 131, in generate
    (inputs, position_ids, attention_mask, _, inputs_embeds, _) = self.prepare_inputs_labels_for_multimodal(inputs, position_ids, attention_mask, None, None, images, modalities, image_sizes=image_sizes)
  File "/home/lanyun/project/train/unicom/llava/model/llava_arch.py", line 397, in prepare_inputs_labels_for_multimodal
    image_feature = torch.cat((image_feature, self.model.image_newline[:, None, None].expand(*image_feature.shape[:-1], 1).to(image_feature.device)), dim=-1)
RuntimeError: The expanded size of the tensor (7168) must match the existing size (3584) at non-singleton dimension 0.  Target sizes: [7168, 6, 1].  Tensor sizes: [3584, 1, 1]
11-20 13:28:23 [lmms-eval/lmms_eval/__main__.py:220] ERROR Error during evaluation: The expanded size of the tensor (7168) must match the existing size (3584) at non-singleton dimension 0.  Target sizes: [7168, 6, 1].  Tensor sizes: [3584, 1, 1]
Traceback (most recent call last):
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/__main__.py", line 206, in cli_evaluate
    results, samples = cli_evaluate_single(args)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/__main__.py", line 301, in cli_evaluate_single
    results = evaluator.simple_evaluate(
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/utils.py", line 453, in _wrapper
    return fn(*args, **kwargs)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/evaluator.py", line 135, in simple_evaluate
    results = evaluate(
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/utils.py", line 453, in _wrapper
    return fn(*args, **kwargs)
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/evaluator.py", line 297, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)  # Choiszt run generate until
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/models/llava.py", line 407, in generate_until
    raise e
  File "/home/lanyun/project/evaluation/lmms-eval/lmms_eval/models/llava.py", line 392, in generate_until
    cont = self.model.generate(
  File "/home/lanyun/miniconda3/envs/unicom/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/lanyun/project/train/unicom/llava/model/language_model/llava_qwen.py", line 131, in generate
    (inputs, position_ids, attention_mask, _, inputs_embeds, _) = self.prepare_inputs_labels_for_multimodal(inputs, position_ids, attention_mask, None, None, images, modalities, image_sizes=image_sizes)
  File "/home/lanyun/project/train/unicom/llava/model/llava_arch.py", line 397, in prepare_inputs_labels_for_multimodal
    image_feature = torch.cat((image_feature, self.model.image_newline[:, None, None].expand(*image_feature.shape[:-1], 1).to(image_feature.device)), dim=-1)
RuntimeError: The expanded size of the tensor (7168) must match the existing size (3584) at non-singleton dimension 0.  Target sizes: [7168, 6, 1].  Tensor sizes: [3584, 1, 1]
yiyexy commented 6 days ago

The shape of image_feature something error.

The number of image features should be a multiple of 576. Please debug along this line.

blackDZS commented 5 days ago

Thanks for your reply, I have debug the image features in llava_arch.py , the code as below change tensor size of image_feature from 3584 to 7168, I can't understand why

The number of image features should be a multiple of 576.

                        if "anyres_max" in image_aspect_ratio:
                            matched_anyres_max_num_patches = re.match(r"anyres_max_(\d+)", image_aspect_ratio)
                            if matched_anyres_max_num_patches:
                                max_num_patches = int(matched_anyres_max_num_patches.group(1))

                        if image_aspect_ratio == "anyres" or "anyres_max" in image_aspect_ratio:
                            if hasattr(self.get_vision_tower(), "image_size"):
                                vision_tower_image_size = self.get_vision_tower().image_size
                            else:
                                raise ValueError("vision_tower_image_size is not found in the vision tower.")
                            try:
                                num_patch_width, num_patch_height = get_anyres_image_grid_shape(image_sizes[image_idx], self.config.image_grid_pinpoints, vision_tower_image_size)
                            except Exception as e:
                                rank0_print(f"Error: {e}")
                                num_patch_width, num_patch_height = 2, 2
                            image_feature = image_feature.view(num_patch_height, num_patch_width, height, width, -1)
                        else:
                            image_feature = image_feature.view(2, 2, height, width, -1)
yiyexy commented 4 days ago

Could you give me the config.json from model? I also couldn't understand the number of 3584.

blackDZS commented 4 days ago

Very strange, I had this problem before, but it suddenly disappeared later, and now I can't reproduce the result