Open Lucas-16 opened 2 months ago
Traceback (most recent call last): File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/starlette/routing.py", line 732, in lifespan async with self.lifespan_context(app) as maybe_state: File "/root/anaconda3/envs/openllm/lib/python3.9/contextlib.py", line 181, in aenter return await self.gen.anext() File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/bentoml/_internal/server/base_app.py", line 74, in lifespan await on_startup() File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/_bentoml_impl/server/app.py", line 275, in create_instance self._service_instance = self.service() File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/_bentoml_sdk/service/factory.py", line 257, in call instance = self.inner() File "/root/.openllm/repos/github.com/bentoml/openllm-models/main/bentoml/bentos/qwen2/0.5b-instruct-fp16-33df/src/service.py", line 99, in init self.engine = AsyncLLMEngine.from_engine_args(ENGINE_ARGS) File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 466, in from_engine_args engine = cls( File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 380, in init self.engine = self._init_engine(*args, kwargs) File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 547, in _init_engine return engine_class(*args, kwargs) File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 251, in init self.model_executor = executor_class( File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/executor/executor_base.py", line 47, in init self._init_executor() File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/executor/gpu_executor.py", line 34, in _init_executor self.driver_worker = self._create_worker() File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/executor/gpu_executor.py", line 85, in _create_worker return create_worker(self._get_create_worker_kwargs( File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/executor/gpu_executor.py", line 20, in create_worker wrapper.init_worker(*kwargs) File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/worker/worker_base.py", line 367, in init_worker self.worker = worker_class(args, kwargs) File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/worker/worker.py", line 90, in init self.model_runner: GPUModelRunnerBase = ModelRunnerClass( File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 651, in init self.attn_backend = get_attn_backend( File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/attention/selector.py", line 46, in get_attn_backend backend = which_attn_to_use(num_heads, head_size, num_kv_heads, File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/attention/selector.py", line 149, in which_attn_to_use if current_platform.get_device_capability()[0] < 8: File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/platforms/cuda.py", line 49, in get_device_capability return get_physical_device_capability(physical_device_id) File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/platforms/cuda.py", line 18, in wrapper pynvml.nvmlInit() File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/pynvml.py", line 1793, in nvmlInit nvmlInitWithFlags(0) File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/pynvml.py", line 1776, in nvmlInitWithFlags _LoadNvmlLibrary() File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/pynvml.py", line 1823, in _LoadNvmlLibrary _nvmlCheckReturn(NVML_ERROR_LIBRARY_NOT_FOUND) File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/pynvml.py", line 855, in _nvmlCheckReturn raise NVMLError(ret) pynvml.NVMLError_LibraryNotFound: NVML Shared Library Not Found
maybe you can try the llamacpp models, but by default vllm requires GPU to be available.
All models supported by openllm today requires Nvidia GPU or Apple silicon to run. We may add more options in the future, or you can contribute to https://github.com/bentoml/OpenLLM-models
Describe the bug
I want to run Qwen0.5b on a k8s cluster without GPU, but the service startup has failed so far. Is there any way to support CPU machines ![Uploading 屏幕截图 2024-09-09 164657.jpg…]()
To reproduce
No response
Logs
No response
Environment
only have CPU
System information (Optional)
No response