OpenBMB / MiniCPM

MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
Apache License 2.0
5.37k stars 363 forks source link

[Bug]: Convert_weight #80

Closed Vinaysukhesh98 closed 6 months ago

Vinaysukhesh98 commented 6 months ago

Is there an existing issue ? / 是否已有相关的 issue ?

Describe the bug / 描述这个 bug

.local/lib/python3.10/site-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error raise py_err tvm.error.InternalError: Traceback (most recent call last): 10: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::runtime::Module (tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target)>::AssignTypedLambda<tvm::mk_TVM23::{lambda(tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target)#1}>(tvm::__mk_TVM23::{lambda(tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target)#1}, std::cxx11::basic_string<char, std::char_traits, std::allocator >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue)#1}> >::Call(tvm::runtime::PackedFuncObj const, std::cxx11::basic_string<char, std::char_traits, std::allocator >, tvm::runtime::TVMRetValue) 9: tvm::TIRToRuntime(tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target const&) 8: tvm::codegen::Build(tvm::IRModule, tvm::Target) 7: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::runtime::Module (tvm::IRModule, tvm::Target)>::AssignTypedLambda<tvm::codegen::mk_TVM0::{lambda(tvm::IRModule, tvm::Target)#1}>(tvm::codegen::__mk_TVM0::{lambda(tvm::IRModule, tvm::Target)#1}, std::cxx11::basic_string<char, std::char_traits, std::allocator >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue)#1}> >::Call(tvm::runtime::PackedFuncObj const, std::cxx11::basic_string<char, std::char_traits, std::allocator >, tvm::runtime::TVMRetValue) 6: tvm::codegen::BuildSPIRV(tvm::IRModule, tvm::Target) 5: tvm::codegen::LowerToSPIRV[abi:cxx11](tvm::IRModule, tvm::Target) 4: tvm::codegen::CodeGenSPIRV::BuildFunction(tvm::tir::PrimFunc const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) 3: tvm::codegen::spirv::IRBuilder::GetSType(tvm::runtime::DataType const&, unsigned int, unsigned int) 2: tvm::codegen::spirv::IRBuilder::DeclareType(tvm::runtime::DataType const&, unsigned int, unsigned int) 1: tvm::codegen::spirv::IRBuilder::AddCapabilityFor(tvm::runtime::DataType const&) 0: _ZN3tvm7runtime6deta File "/workspace/tvm/src/target/spirv/ir_builder.cc", line 566 InternalError: Check failed: (spirvsupport.supports_float16) is false: Vulkan target does not support Float16 capability. If your device supports 16-bit float operations, please either add -supports_float16=1 to the target, or query all device parameters by adding -from_device=0.

To Reproduce / 如何复现

cloned https://github.com/OpenBMB/mlc-MiniCPM.git mkdir -p build && cd build

generate build configuration

python3 ../cmake/gen_cmake_config.py && cd ..

build mlc_chat_cli

cd build && cmake .. && cmake --build . --parallel $(nproc) && cd ..

install

cd python && pip install -e . && cd ..

Expected behavior / 期望的结果

No response

Screenshots / 截图

No response

Environment / 环境

- OS: [e.g. Ubuntu 20.04]-ubuntu-22
- Pytorch: [e.g. torch 2.0.0]
- CUDA: [e.g. CUDA 11.8]-no
- Device: [e.g. A10, RTX3090]
ml-dtypes                0.3.2
mlc-ai-nightly           0.15.1
mlc-ai-nightly-cu117     0.12.dev2149
mlc-chat                 0.1.dev0       mlc-MiniCPM/python
torch                    2.1.2

Additional context / 其他信息

No response

Vinaysukhesh98 commented 6 months ago

could able to compile with float 32 dtype getting below error while running mlc_chat gen_config --model-type ${MODEL_TYPE} ./dist/models/${MODEL_NAME}-hf/ --quantization $QUANTIZATION --conv-template LM --sliding-window-size 768 -o dist/${MODEL_NAME}/ mlc-MiniCPM/python/mlc_chat/interface/gen_config.py", line 149, in gen_config fast_tokenizer.backend_tokenizer.save(str(tokenizer_json_save_dest))

Achazwl commented 6 months ago

mlc-MiniCPM is only tested on Android. I notice that you are on a Linux device. On PC devices, we recommend using llama.cpp. We thanks @runfuture for supporting MiniCPM into llama.cpp, and we can now use llama.cpp to run gguf format MiniCPM.

Achazwl commented 6 months ago

If you need to compile to Android device, select OpenCL instead of Vulkan when preparing your mlc package

Vinaysukhesh98 commented 6 months ago

If you need to compile to Android device, select OpenCL instead of Vulkan when preparing your mlc package

i have recopiled mlc pacakage with open cl still getting same error , do i need to recompile the tvm?

Achazwl commented 6 months ago

OpenCL should support float16, what's the new error message

Vinaysukhesh98 commented 6 months ago

argument --model-type: invalid choice: 'minicpm_v' (choose from 'auto', 'llama', 'mistral', 'gemma', 'gpt2', 'mixtral', 'gpt_neox', 'gpt_bigcode', 'phi-msft', 'phi', 'qwen', 'qwen2', 'stablelm', 'baichuan', 'internlm', 'rwkv5')

Achazwl commented 6 months ago

seems mlc-MiniCPM is not correctly installed. Are these done?

Achazwl commented 6 months ago
python3 ../cmake/gen_cmake_config.py && cd ..
cd build && cmake .. && cmake --build . --parallel $(nproc) && cd ..
cd python && pip install -e . && cd ..

these commands should be run in MLC-MiniCPM repo but not the original MLC-LLM repo

Vinaysukhesh98 commented 6 months ago

yeah i have did the same got multiple errors like coredum and other i have a doubt once we downloded this repo by below process git clone --recursive https://github.com/OpenBMB/mlc-MiniCPM.git do we need to build tvm or not?

Achazwl commented 6 months ago

tvm is included in the 3rd, only setting the PATH env is enough. You can paste your errors here.

Vinaysukhesh98 commented 6 months ago

getting same error mlc_chat convert_weight --model-type ${MODEL_TYPE} ./dist/models/${MODEL_NAME}/ --quantization $QUANTIZATION -o dist/$MODEL_NAME/ ------------------------- Usage ------------------------- usage: MLC AutoLLM Quantization Framework [-h] --quantization {q0f16,q0f32,q3f16_0,q3f16_1,q4f16_0,q4f16_1,q4f32_1,q4f16_2,q4f16_autoawq,q4f16_ft} [--model-type {auto,llama,mistral,gemma,gpt2,mixtral,gpt_neox,gpt_bigcode,phi-msft,phi,qwen,qwen2,stablelm,baichuan,internlm,rwkv5}] [--device DEVICE] [--source SOURCE] [--source-format {auto,huggingface-torch,huggingface-safetensor,awq}] --output OUTPUT config

positional arguments: config 1) Path to a HuggingFace model directory that contains a config.json or 2) Path to config.json in HuggingFace format, or 3) The name of a pre-defined model architecture. A config.json file in HuggingFace format defines the model architecture, including the vocabulary size, the number of layers, the hidden size, number of attention heads, etc. Example: https://huggingface.co/codellama/CodeLlama- 7b-hf/blob/main/config.json. A HuggingFace directory often contains a config.json which defines the model architecture, the non-quantized model weights in PyTorch or SafeTensor format, tokenizer configurations, as well as an optional generation_config.json provides additional default configuration for text generation. Example: https://hu ggingface.co/codellama/CodeLlama-7b-hf/tree/main. (required)

options: -h, --help show this help message and exit --quantization {q0f16,q0f32,q3f16_0,q3f16_1,q4f16_0,q4f16_1,q4f32_1,q4f16_2,q4f16_autoawq,q4f16_ft} The quantization mode we use to compile. If unprovided, will infer from model. (required, choices: q0f16, q0f32, q3f16_0, q3f16_1, q4f16_0, q4f16_1, q4f32_1, q4f16_2, q4f16_autoawq, q4f16_ft) --model-type {auto,llama,mistral,gemma,gpt2,mixtral,gpt_neox,gpt_bigcode,phi-msft,phi,qwen,qwen2,stablelm,baichuan,internlm,rwkv5} Model architecture such as "llama". If not set, it is inferred from mlc-chat-config.json. (default: "auto") --device DEVICE The device used to do quantization such as "cuda" or "cuda:0". Will detect from local available GPUs if not specified. (default: "auto") --source SOURCE The path to original model weight, infer from config if missing. (default: "auto") --source-format {auto,huggingface-torch,huggingface-safetensor,awq} The format of source model weight, infer from config if missing. (default: "auto", choices: auto, huggingface-torch, huggingface-safetensor, awq") --output OUTPUT, -o OUTPUT The output directory to save the quantized model weight. Will create params_shard_*.bin and ndarray- cache.json in this directory. (required) ------------------------- Error ------------------------- argument --model-type: invalid choice: 'minicpm_v' (choose from 'auto', 'llama', 'mistral', 'gemma', 'gpt2', 'mixtral', 'gpt_neox', 'gpt_bigcode', 'phi-msft', 'phi', 'qwen', 'qwen2', 'stablelm', 'baichuan', 'internlm', 'rwkv5')

model type notfound.

Achazwl commented 6 months ago

What's the error when you run the following commands under the MLC-MiniCPM folder

python3 ../cmake/gen_cmake_config.py && cd ..
cd build && cmake .. && cmake --build . --parallel $(nproc) && cd ..
cd python && pip install -e . && cd ..

I guess you have MLC-LLM version mlc-chat installed first and the MLC-MiniCPM version mlc-chat is not successfully installed, so that when you use mlc-chat to convert weight, python finds the MLC-LLM version. Try uninstall mlc-chat first.

Vinaysukhesh98 commented 6 months ago

I just removed all vulkan drivers and build with vulkan no and opncl-y still the error is same

mlc_chat convert_weight --model-type ${MODEL_TYPE} ./dist/models/${MODEL_NAME}/ --quantization $QUANTIZATION -o dist/$MODEL_NAME/ [2024-03-06 20:07:02] INFO auto_config.py:115: Found model configuration: dist/models/MiniCPM-V/config.json [2024-03-06 20:07:02] INFO auto_device.py:85: Not found device: cuda:0 [2024-03-06 20:07:02] INFO auto_device.py:85: Not found device: rocm:0 [2024-03-06 20:07:02] INFO auto_device.py:85: Not found device: metal:0 [2024-03-06 20:07:03] INFO auto_device.py:76: Found device: vulkan:0 [2024-03-06 20:07:03] INFO auto_device.py:85: Not found device: opencl:0 [2024-03-06 20:07:03] INFO auto_device.py:33: Using device: vulkan:0 [2024-03-06 20:07:03] INFO auto_weight.py:70: Finding weights in: dist/models/MiniCPM-V [2024-03-06 20:07:03] INFO auto_weight.py:136: Not found Huggingface PyTorch [2024-03-06 20:07:03] INFO auto_weight.py:143: Found source weight format: huggingface-safetensor. Source configuration: dist/models/MiniCPM-V/model.safetensors.index.json [2024-03-06 20:07:03] INFO auto_weight.py:106: Using source weight configuration: dist/models/MiniCPM-V/model.safetensors.index.json. Use --source to override. [2024-03-06 20:07:03] INFO auto_weight.py:110: Using source weight format: huggingface-safetensor. Use --source-format to override. [2024-03-06 20:07:03] INFO auto_config.py:153: Found model type: minicpm_v. Use --model-type to override. Weight conversion with arguments: --config dist/models/MiniCPM-V/config.json --quantization GroupQuantize(name='q4f16_1', kind='group-quant', group_size=32, quantize_dtype='int4', storage_dtype='uint32', model_dtype='float16', linear_weight_layout='NK', num_elem_per_storage=8, num_storage_per_group=4, max_int_value=7) --model-type minicpm_v --device vulkan:0 --source dist/models/MiniCPM-V/model.safetensors.index.json --source-format huggingface-safetensor --output dist/MiniCPM-V [2024-03-06 20:07:03] INFO mistral_model.py:55: prefill_chunk_size defaults to sliding_window_size (4096) [2024-03-06 20:07:06] WARNING utils.py:25: Unused extern parameters: llm.lm_head.weight [2024-03-06 20:07:06] INFO huggingface_loader.py:169: Loading HF parameters from: dist/models/MiniCPM-V/model-00002-of-00002.safetensors
[2024-03-06 20:07:06] INFO huggingface_loader.py:129: [Not quantized] Parameter: "llm.model.layers.36.input_layernorm.weight", shape: (2304,), dtype: float16
[2024-03-06 20:07:06] INFO group_quantization.py:230: Compiling quantize function for key: ((2304, 5760), float16, vulkan, axis=1, output_transpose=False)
0%|▎ | 1/578 [00:00<07:20, 1.31it/s] Traceback (most recent call last): File "/home/mbulab5008/.local/bin/mlc_chat", line 33, in sys.exit(load_entry_point('mlc-chat', 'console_scripts', 'mlc_chat')()) File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.call File "tvm/_ffi/_cython/./packed_func.pxi", line 263, in tvm._ffi._cy3.core.FuncCall File "tvm/_ffi/_cython/./packed_func.pxi", line 252, in tvm._ffi._cy3.core.FuncCall3 File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL File "ocal/lib/python3.10/site-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error raise py_err tvm.error.InternalError: Traceback (most recent call last): 10: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::runtime::Module (tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target)>::AssignTypedLambda<tvm::mk_TVM23::{lambda(tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target)#1}>(tvm::__mk_TVM23::{lambda(tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target)#1}, std::cxx11::basic_string<char, std::char_traits, std::allocator >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue)#1}> >::Call(tvm::runtime::PackedFuncObj const, std::cxx11::basic_string<char, std::char_traits, std::allocator >, tvm::runtime::TVMRetValue) 9: tvm::TIRToRuntime(tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target const&) 8: tvm::codegen::Build(tvm::IRModule, tvm::Target) 7: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::runtime::Module (tvm::IRModule, tvm::Target)>::AssignTypedLambda<tvm::codegen::mk_TVM0::{lambda(tvm::IRModule, tvm::Target)#1}>(tvm::codegen::__mk_TVM0::{lambda(tvm::IRModule, tvm::Target)#1}, std::cxx11::basic_string<char, std::char_traits, std::allocator >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue)#1}> >::Call(tvm::runtime::PackedFuncObj const, std::cxx11::basic_string<char, std::char_traits, std::allocator >, tvm::runtime::TVMRetValue) 6: tvm::codegen::BuildSPIRV(tvm::IRModule, tvm::Target) 5: tvm::codegen::LowerToSPIRV[abi:cxx11](tvm::IRModule, tvm::Target) 4: tvm::codegen::CodeGenSPIRV::BuildFunction(tvm::tir::PrimFunc const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) 3: tvm::codegen::spirv::IRBuilder::GetSType(tvm::runtime::DataType const&, unsigned int, unsigned int) 2: tvm::codegen::spirv::IRBuilder::DeclareType(tvm::runtime::DataType const&, unsigned int, unsigned int) 1: tvm::codegen::spirv::IRBuilder::AddCapabilityFor(tvm::runtime::DataType const&) 0: _ZN3tvm7runtime6deta File "/workspace/tvm/src/target/spirv/ir_builder.cc", line 566 InternalError: Check failed: (spirvsupport.supports_float16) is false: Vulkan target does not support Float16 capability. If your device supports 16-bit float operations, please either add -supports_float16=1 to the target, or query all device parameters by adding -from_device=0.


Vinaysukhesh98 commented 6 months ago

hi after re installing the setup am still getting same error how can i resolve it.

Vinaysukhesh98 commented 6 months ago

i have installed opencl libraries and build enabling opencl only still the device detectting vulkan , even i removed all vulkan drives what could be the possible issue.

Achazwl commented 6 months ago

May related to MLC-LLM (https://github.com/mlc-ai/mlc-llm/issues/1606). I cannot reproduce here.