bentoml / OpenLLM

Run any open-source LLMs, such as Llama 3.1, Gemma, as OpenAI compatible API endpoint in the cloud.
https://bentoml.com
Apache License 2.0
9.69k stars 616 forks source link

Can openllm support local path model? #1044

Open dsp6414 opened 1 month ago

dsp6414 commented 1 month ago

how can i use openllm for local lora model?

dsp6414 commented 1 month ago

openllm 部署

  1. 安装openllm pip install openllm
  2. 安装bentoml pip install bentoml
  3. 更新openllm repo openllm repo update

4.创建venv 虚拟环境 python -m uv venv /home/tcx/.openllm/venv/998690274545817638

5.激活venv虚拟环境 source /home/tcx/.openllm/venv/998690274545817638/bin/activate

6.安装依赖项 python -m uv pip install -p /home/tcx/.openllm/venv/998690274545817638/bin/python -r /home/tcx/.openllm/venv/998690274545817638/requirements.txt

  1. huggingface克隆模型仓库 https://huggingface.co/Qwen/Qwen2-0.5B-Instruct 本地存放目录 /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct

  2. 更新模型仓库参数 /home/tcx/.openllm/repos/github.com/bentoml/openllm-models/main/bentoml/bentos/qwen2/0.5b-instruct-fp16-fcc6

src/bentofile.yaml 更新如下 conda: channels: null dependencies: null environment_yml: null pip: null description: null docker: base_image: null cuda_version: null distro: debian dockerfile_template: null env: HF_TOKEN: '' python_version: '3.9' setup_script: null system_packages: null envs:

bento_constants.py 更新如下

CONSTANT_YAML = ''' engine_config: dtype: half max_model_len: 2048 model: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct extra_labels: model_name: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct openllm_alias: 0.5b,0.5b-instruct project: vllm-chat service_config: name: qwen2 resources: gpu: 1 gpu_type: nvidia-rtx-3060 traffic: timeout: 300

'''

bento.yaml 更新如下

service: service:VLLM name: qwen2 version: 0.5b-instruct-fp16-fcc6 bentoml_version: 1.2.20 creation_time: '2024-07-12T14:16:26.873508+00:00' labels: model_name: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct openllm_alias: 0.5b,0.5b-instruct platforms: linux source: https://github.com/bentoml/openllm-models-feed/tree/main/source/vllm-chat models: [] runners: [] entry_service: qwen2 services:

9.启动venv虚拟环境,运行命令 进入/home/tcx/.openllm/repos/github.com/bentoml/openllm-models/main/bentoml/bentos/qwen2/0.5b-instruct-fp16-fcc6/src目录执行命令

$ export BENTOML_HOME=/home/tcx/.openllm/repos/github.com/bentoml/openllm-models/main/bentoml $ source /home/tcx/.openllm/venv/998690274545817638/bin/activate $ bentoml serve qwen2:0.5b-instruct-fp16-fcc6

或者 bentoml serve .

10 如果端口被占用,执行如下命令 netstat -tulnp | grep 3000 sudo kill -9 进程号

dsp6414 commented 1 month ago

1

bojiang commented 1 month ago

It seems that you have a solution step by step. Anything we can help?

dsp6414 commented 1 month ago

It seems that you have a solution step by step. Anything we can help?

I still do not know how to load a Lora fine-tuning model or where to modify the yaml file.

aarnphm commented 1 month ago

I don't think have loading lora supported yet, but we can add this @bojiang

bojiang commented 1 month ago

As for local path model, I think we can support it.

dsp6414 commented 1 month ago

thanks🌺

dsp6414 commented 1 month ago

openllm-models service.py

vllm_api_server.openai_serving_chat = OpenAIServingChat( engine=self.engine, served_model_names=[ENGINE_CONFIG["model"]], response_role="assistant", chat_template=chat_template, model_config=model_config, lora_modules=None, prompt_adapters=None, request_logger=None, ) vllm_api_server.openai_serving_completion = OpenAIServingCompletion( engine=self.engine, served_model_names=[ENGINE_CONFIG["model"]], model_config=model_config, lora_modules=None, prompt_adapters=None, request_logger=None, )

both set lora_modules=None, how to set my lora model?

dsp6414 commented 1 month ago

🌼

dsp6414 commented 1 month ago

https://zhuanlan.zhihu.com/p/711869222