Closed siz0001 closed 1 month ago
For the first question: You need to compile the mllm project, after which the quantized executable file will be generated. For the second question, the mllm model directly converted stores data in fp32 format, whereas the safetensor you downloaded is in fp16 format, hence it is twice the size.
thank you for your quick reply, what is compile mean? would you please explain more?
Pls use the following command to compile the complete mllm project before using the convert model
cd scripts
./build.sh
thank you bro. 中秋节快乐! and, I have some problem when quantize, both Q4_K & Q4_0. why is it happen? My model is ko-gemma-2b, which I manually converted to an MLLM model. During the quantization process, an issue occurred
Quantize param model.layers.7.post_attention_layernorm.weight to F32 size:8192
Quantize param model.layers.7.self_attn.k_proj.weight to Q4_0 size:294912
Quantize param model.layers.7.self_attn.o_proj.weight to Q4_0 size:2359296
Quantize param model.layers.7.self_attn.q_proj.weight to Q4_0 size:2359296
Quantize param model.layers.7.self_attn.v_proj.weight to Q4_0 size:294912 type:Q4_0
Quantize param model.layers.8.input_layernorm.weight to F32 size:8192
Killed
It might be an Out of Memory issue. How much DRAM does your machine have? The quantization process can consume a significant amount of memory.
It was indeed a memory issue. Both model conversion and quantization went smoothly after upgrading the DRAM to 64GB. Thank you.
hello, would you please help me? I wanna quantize gemma-ko-2b model, but cannot find bin folder.
cd bin
./quantize model.mllm model_q4_k.mllm Q4_K
and why the size of mllm model is 2 times larger then original safetensor file size?