EvolvingLMMs-Lab / lmms-eval

Accelerating the development of large multimodal models (LMMs) with lmms-eval
https://lmms-lab.github.io/
Other
1.03k stars 53 forks source link

add Llava-SGlang #54

Closed jzhang38 closed 2 months ago

jzhang38 commented 2 months ago

Add llava_sglang.

Some caveats:

  1. sglang currently only supports single-image input. We use the first image by default.
  2. there is no concept of batch size in sglang. We use "parallel" instead.
  3. use python -m instead of accelerate
  4. sglang only supports tensor parallel (tp_size). It does not support data parallel

example eval config and script:

- model: llava_sglang
  model_args: pretrained=liuhaotian/llava-v1.6-34b,tokenizer=liuhaotian/llava-v1.6-34b-tokenizer,conv_template=chatml,tp_size=8,parallel=4
  tasks: mme
  batch_size: 1
  log_samples: true
  log_samples_suffix: eval_mme 
  output_path: "./logs/"

python -m lmms_eval --config config.yaml 
Luodian commented 2 months ago

Thanks! This is a great feature enabling inference for larger models.

However, can you put a result screenshot or do more test to see if the results could match?

Luodian commented 2 months ago

Hi @jzhang38 🦦

jzhang38 commented 2 months ago

1.5 7B:

- model: llava_sglang
  model_args: pretrained=liuhaotian/llava-v1.5-7b
  tasks: mme,ai2d,scienceqa_img
  batch_size: 1
  log_samples: true
  log_samples_suffix: eval_mme 
  output_path: "./logs/"
Tasks Version Filter n-shot Metric Value Stderr
mme Yaml none 0 mme_cognition_score 352.5000 ± N/A
none 0 mme_percetion_score 1511.3936 ± N/A
ai2d Yaml none 0 exact_match 55.6023 ± 0.0089
scienceqa_img Yaml none 0 exact_match 69.5092 ± 0.0103

Match pretty closely

jzhang38 commented 2 months ago

1.5 13B:

- model: llava_sglang
  model_args: pretrained=liuhaotian/llava-v1.5-13b
  tasks: mme,ai2d,scienceqa_img
  batch_size: 1
  log_samples: true
  log_samples_suffix: eval_mme 
  output_path: "./logs/"
Tasks Version Filter n-shot Metric Value Stderr
ai2d Yaml none 0 exact_match 59.1645 ± 0.0088
mme Yaml none 0 mme_cognition_score 295.0000 ± N/A
none 0 mme_percetion_score 1523.5189 ± N/A
scienceqa_img Yaml none 0 exact_match 72.8309 ± 0.0099
jzhang38 commented 2 months ago

@Luodian