Ask for help: I encountered the following error while running `gsm8k_inference.py`:

cyp-jlu-ai commented 5 months ago

Traceback (most recent call last): File "/home/changyupeng/PiSSA/inference/gsm8k_inference.py", line 136, in gsm8k_test(model=args.model, data_path=args.data_file, start=args.start, end=args.end, batch_size=args.batch_size, tensor_parallel_size=args.tensor_parallel_size) File "/home/changyupeng/PiSSA/inference/gsm8k_inference.py", line 93, in gsm8k_test llm = LLM(model=model,tensor_parallel_size=tensor_parallel_size) File "/home/changyupeng/miniconda3/envs/pisa/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 109, in init self.llm_engine = LLMEngine.from_engine_args(engine_args) File "/home/changyupeng/miniconda3/envs/pisa/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 386, in from_engine_args engine_configs = engine_args.create_engine_configs() File "/home/changyupeng/miniconda3/envs/pisa/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 287, in create_engine_configs model_config = ModelConfig( File "/home/changyupeng/miniconda3/envs/pisa/lib/python3.10/site-packages/vllm/config.py", line 118, in init self._verify_quantization() File "/home/changyupeng/miniconda3/envs/pisa/lib/python3.10/site-packages/vllm/config.py", line 184, in _verify_quantization raise ValueError( ValueError: Unknown quantization method: bitsandbytes. Must be one of ['awq', 'gptq', 'squeezellm', 'marlin'].

Expect the program to run successfully and output prediction results

fxmeng commented 5 months ago

vllm currently does not support bitsandbytes. You can use convert_nf4_model_to_bf16.py to convert the residual model to 16-bit, and then use merge_adapter_to_base_model.py to merge it with the PiSSA module.

cyp-jlu-ai commented 5 months ago

vllm currently does not support bitsandbytes. You can use convert_nf4_model_to_bf16.py to convert the residual model to 16-bit, and then use merge_adapter_to_base_model.py to merge it with the PiSSA module.

@fxmeng Thank you for your response.

I understand that vllm does not currently support bitsandbytes. I would like to further confirm the specific steps as follows:

Model Conversion:

You mentioned using convert_nf4_model_to_bf16.py to convert the model to 16-bit. How exactly do you run this script? Are there any parameters that need attention?

Additionally, if there are any other matters or prerequisites to be aware of, please let me know. Thank you very much for your help!

GraphPKU / PiSSA

Ask for help: I encountered the following error while running `gsm8k_inference.py`: #12