[Feature Request] Evaluating quantized models

zjysteven commented 1 month ago

Hello,

As the title suggested, I'm wondering if lmms-eval has plan for enabling evaluation with quantized lmms, e.g. those with AWQ.

Why this is necessary? The inference cost would definitely keep increasing as we have 1) more or larger benchmarks and 2) larger models (for example LLaVA-Next has 72B and 110B version already). Therefore quantized models are always of great interests to the broad community (including and beyond researchers). Being able to evaluate quantized models is therefore important.

I'm opening this issue mainly to see if this would be something lmms-eval will support. If so, I'm interested in contributing to this feature in some ways.

Thanks

kcz358 commented 1 month ago

Hi, I don't think it is supported in lmms-eval currently and definitely you are welcome to raise a PR for this feaure

Luodian commented 1 month ago

If you wish to do large models inference, I strongly suggest using llava_sglang, in my test, that's even close or faster than quantization.

zjysteven commented 1 month ago

I see. Thank you both. Closing now.

EvolvingLMMs-Lab / lmms-eval

[Feature Request] Evaluating quantized models #83