TRI-ML / vlm-evaluation

VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning
Other
64 stars 7 forks source link

Slow model inference when evaluation #3

Closed zeyuanyin closed 3 months ago

zeyuanyin commented 3 months ago

Thanks for your wonderful work, which helps me a lot.

When using this tool, I met a very slow model inference. It will take a long time to do the evaluation especially on vqa-v2-full dataset with 214354 samples.

I noticed that the batch_size is fixed to 1 due to the bug in HF and cannot be increased. https://github.com/TRI-ML/vlm-evaluation/blob/098224f2a49bcfe2d3e1d21e75967d1c1cedcca8/scripts/evaluate.py#L55-L56

I am wondering whether there is any other way to speed up the inference on evaluation.

siddk commented 3 months ago

We’re working on adding robust support for batched inference to speed up workloads like this one. Unfortunately, VQA-v2 just has a massive validation set — if you’re iterating on experiments, I’d consider trying the “subsampled” variant of the dataset (16K examples), and only running out the full 200K for getting final numbers!