Closed cyp-jlu-ai closed 5 months ago
vllm currently does not support bitsandbytes. You can use convert_nf4_model_to_bf16.py to convert the residual model to 16-bit, and then use merge_adapter_to_base_model.py to merge it with the PiSSA module.
vllm currently does not support bitsandbytes. You can use convert_nf4_model_to_bf16.py to convert the residual model to 16-bit, and then use merge_adapter_to_base_model.py to merge it with the PiSSA module.
@fxmeng Thank you for your response.
I understand that vllm does not currently support bitsandbytes. I would like to further confirm the specific steps as follows:
Model Conversion:
convert_nf4_model_to_bf16.py
to convert the model to 16-bit. How exactly do you run this script? Are there any parameters that need attention?Additionally, if there are any other matters or prerequisites to be aware of, please let me know. Thank you very much for your help!
Traceback (most recent call last): File "/home/changyupeng/PiSSA/inference/gsm8k_inference.py", line 136, in
gsm8k_test(model=args.model, data_path=args.data_file, start=args.start, end=args.end, batch_size=args.batch_size, tensor_parallel_size=args.tensor_parallel_size)
File "/home/changyupeng/PiSSA/inference/gsm8k_inference.py", line 93, in gsm8k_test
llm = LLM(model=model,tensor_parallel_size=tensor_parallel_size)
File "/home/changyupeng/miniconda3/envs/pisa/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 109, in init
self.llm_engine = LLMEngine.from_engine_args(engine_args)
File "/home/changyupeng/miniconda3/envs/pisa/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 386, in from_engine_args
engine_configs = engine_args.create_engine_configs()
File "/home/changyupeng/miniconda3/envs/pisa/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 287, in create_engine_configs
model_config = ModelConfig(
File "/home/changyupeng/miniconda3/envs/pisa/lib/python3.10/site-packages/vllm/config.py", line 118, in init
self._verify_quantization()
File "/home/changyupeng/miniconda3/envs/pisa/lib/python3.10/site-packages/vllm/config.py", line 184, in _verify_quantization
raise ValueError(
ValueError: Unknown quantization method: bitsandbytes. Must be one of ['awq', 'gptq', 'squeezellm', 'marlin'].
Expect the program to run successfully and output prediction results