使用vllm推理Qwen2-VL-2B-Instruct-GPTQ-Int4报错,一直提示:
File "/usr/local/venv/model_llm/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 130, in build_async_engine_client_from_engine_args
if (model_is_embedding(engine_args.model, engine_args.trust_remote_code,
File "/usr/local/venv/model_llm/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 71, in model_is_embedding
return ModelConfig(model=model_name,
File "/usr/local/venv/model_llm/lib/python3.10/site-packages/vllm/config.py", line 222, in init
self.max_model_len = _get_and_verify_max_len(
File "/usr/local/venv/model_llm/lib/python3.10/site-packages/vllm/config.py", line 1739, in _get_and_verify_max_len
assert "factor" in rope_scaling
AssertionError
使用vllm推理Qwen2-VL-2B-Instruct-GPTQ-Int4报错,一直提示: File "/usr/local/venv/model_llm/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 130, in build_async_engine_client_from_engine_args if (model_is_embedding(engine_args.model, engine_args.trust_remote_code, File "/usr/local/venv/model_llm/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 71, in model_is_embedding return ModelConfig(model=model_name, File "/usr/local/venv/model_llm/lib/python3.10/site-packages/vllm/config.py", line 222, in init self.max_model_len = _get_and_verify_max_len( File "/usr/local/venv/model_llm/lib/python3.10/site-packages/vllm/config.py", line 1739, in _get_and_verify_max_len assert "factor" in rope_scaling AssertionError