llava_hf inference is extremely slow

Hi there,

Thank you for the benchmark. I have tried to this repo and get the inference of llava_hf model type and the checkpoint of llava-hf/llava-v1.6-mistral-7b-hf on infovqa, however, the inference speed is extremely slow and also it get out of memory issue even when I use A100 gpu with 80G memory. The following is my command line:

python -m accelerate.commands.launch \ --num_processes=8 \ -m lmms_eval \ --model llava_hf \ --model_args pretrained="llava-hf/llava-v1.6-mistral-7b-hf" \ --tasks infovqa \ --batch_size 1 \ --log_samples \ --log_samples_suffix llava_v1.6_mistral_infovqa_scienceqa_docvqa \ --output_path ./logs/

On the other hand, running the inference with llava model type with checkpoint of liuhaotian/llava-v1.6-mistral-7b on the same dataset only requires 45 mins to do the inference without any OOM issue.

Thanks for your guidance.

EvolvingLMMs-Lab / lmms-eval

llava_hf inference is extremely slow #387