Closed jzhang38 closed 2 months ago
Thanks! This is a great feature enabling inference for larger models.
However, can you put a result screenshot or do more test to see if the results could match?
Hi @jzhang38 🦦
1.5 7B:
- model: llava_sglang
model_args: pretrained=liuhaotian/llava-v1.5-7b
tasks: mme,ai2d,scienceqa_img
batch_size: 1
log_samples: true
log_samples_suffix: eval_mme
output_path: "./logs/"
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
mme | Yaml | none | 0 | mme_cognition_score | 352.5000 | ± | N/A |
none | 0 | mme_percetion_score | 1511.3936 | ± | N/A | ||
ai2d | Yaml | none | 0 | exact_match | 55.6023 | ± | 0.0089 |
scienceqa_img | Yaml | none | 0 | exact_match | 69.5092 | ± | 0.0103 |
Match pretty closely
1.5 13B:
- model: llava_sglang
model_args: pretrained=liuhaotian/llava-v1.5-13b
tasks: mme,ai2d,scienceqa_img
batch_size: 1
log_samples: true
log_samples_suffix: eval_mme
output_path: "./logs/"
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
ai2d | Yaml | none | 0 | exact_match | 59.1645 | ± | 0.0088 |
mme | Yaml | none | 0 | mme_cognition_score | 295.0000 | ± | N/A |
none | 0 | mme_percetion_score | 1523.5189 | ± | N/A | ||
scienceqa_img | Yaml | none | 0 | exact_match | 72.8309 | ± | 0.0099 |
@Luodian
Add llava_sglang.
Some caveats:
example eval config and script: