Closed highkay closed 2 months ago
Also want a quantization for omnilmm-12b. BNB seems not to work.
We released the more powerful MiniCPM-Llama3-V 2.5 today,which has 8.5B parameters and much better performance, the int4 version is here. Welcome to use.
I have tried the multi card inference but failed, it seems the layers are not dispatched correctly.
So it would be convenience to provide a quantization model for low memory cards, it may be nice if under 20GB single card or 40GB multi cards.