Request for a 4bit quantization model for omnilmm 12B

highkay commented 3 months ago

I have tried the multi card inference but failed, it seems the layers are not dispatched correctly.

        with init_empty_weights():
            model = OmniLMMForCausalLM.from_pretrained(model_name, tune_clip=True, torch_dtype=torch.bfloat16)
        model = load_checkpoint_and_dispatch(model, model_name, dtype=torch.bfloat16, 
                    device_map="balanced",  no_split_module_classes=['Eva','MistralDecoderLayer', 'ModuleList', 'Resampler']
        )

So it would be convenience to provide a quantization model for low memory cards, it may be nice if under 20GB single card or 40GB multi cards.

jiayev commented 2 months ago

Also want a quantization for omnilmm-12b. BNB seems not to work.

iceflame89 commented 2 months ago

We released the more powerful MiniCPM-Llama3-V 2.5 today，which has 8.5B parameters and much better performance， the int4 version is here. Welcome to use.

OpenBMB / MiniCPM-V

Request for a 4bit quantization model for omnilmm 12B #53