UbiquitousLearning / mllm

Fast Multimodal LLM on Mobile Devices
https://ubiquitouslearning.github.io/mllm_website
MIT License
507 stars 57 forks source link

where is bin folder? #138

Closed siz0001 closed 1 month ago

siz0001 commented 1 month ago

hello, would you please help me? I wanna quantize gemma-ko-2b model, but cannot find bin folder.

cd bin ./quantize model.mllm model_q4_k.mllm Q4_K

and why the size of mllm model is 2 times larger then original safetensor file size?

chenghuaWang commented 1 month ago

For the first question: You need to compile the mllm project, after which the quantized executable file will be generated. For the second question, the mllm model directly converted stores data in fp32 format, whereas the safetensor you downloaded is in fp16 format, hence it is twice the size.

siz0001 commented 1 month ago

thank you for your quick reply, what is compile mean? would you please explain more?

chenghuaWang commented 1 month ago

Pls use the following command to compile the complete mllm project before using the convert model

cd scripts
./build.sh
siz0001 commented 1 month ago

thank you bro. 中秋节快乐! and, I have some problem when quantize, both Q4_K & Q4_0. why is it happen? My model is ko-gemma-2b, which I manually converted to an MLLM model. During the quantization process, an issue occurred

Quantize param model.layers.7.post_attention_layernorm.weight to F32      size:8192
Quantize param model.layers.7.self_attn.k_proj.weight to Q4_0     size:294912
Quantize param model.layers.7.self_attn.o_proj.weight to Q4_0     size:2359296
Quantize param model.layers.7.self_attn.q_proj.weight to Q4_0     size:2359296
Quantize param model.layers.7.self_attn.v_proj.weight to Q4_0     size:294912 type:Q4_0
Quantize param model.layers.8.input_layernorm.weight to F32       size:8192
Killed
chenghuaWang commented 1 month ago

It might be an Out of Memory issue. How much DRAM does your machine have? The quantization process can consume a significant amount of memory.

siz0001 commented 1 month ago

It was indeed a memory issue. Both model conversion and quantization went smoothly after upgrading the DRAM to 64GB. Thank you.