Closed zjysteven closed 1 month ago
Hi, I don't think it is supported in lmms-eval currently and definitely you are welcome to raise a PR for this feaure
If you wish to do large models inference, I strongly suggest using llava_sglang
, in my test, that's even close or faster than quantization.
I see. Thank you both. Closing now.
Hello,
As the title suggested, I'm wondering if lmms-eval has plan for enabling evaluation with quantized lmms, e.g. those with AWQ.
Why this is necessary? The inference cost would definitely keep increasing as we have 1) more or larger benchmarks and 2) larger models (for example LLaVA-Next has 72B and 110B version already). Therefore quantized models are always of great interests to the broad community (including and beyond researchers). Being able to evaluate quantized models is therefore important.
I'm opening this issue mainly to see if this would be something lmms-eval will support. If so, I'm interested in contributing to this feature in some ways.
Thanks