Can openllm support local path model?

dsp6414 commented 1 month ago

how can i use openllm for local lora model?

dsp6414 commented 1 month ago

openllm 部署

安装openllm pip install openllm
安装bentoml pip install bentoml
更新openllm repo openllm repo update

4.创建venv 虚拟环境 python -m uv venv /home/tcx/.openllm/venv/998690274545817638

5.激活venv虚拟环境 source /home/tcx/.openllm/venv/998690274545817638/bin/activate

6.安装依赖项 python -m uv pip install -p /home/tcx/.openllm/venv/998690274545817638/bin/python -r /home/tcx/.openllm/venv/998690274545817638/requirements.txt

huggingface克隆模型仓库 https://huggingface.co/Qwen/Qwen2-0.5B-Instruct 本地存放目录 /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct
更新模型仓库参数 /home/tcx/.openllm/repos/github.com/bentoml/openllm-models/main/bentoml/bentos/qwen2/0.5b-instruct-fp16-fcc6

src/bentofile.yaml 更新如下 conda: channels: null dependencies: null environment_yml: null pip: null description: null docker: base_image: null cuda_version: null distro: debian dockerfile_template: null env: HF_TOKEN: '' python_version: '3.9' setup_script: null system_packages: null envs:

name: HF_TOKEN exclude: [] include:
'*.py'
ui/*
ui/chunks/*
ui/css/*
ui/media/*
ui/chunks/pages/*
bentovllm_openai/*.py
chat_templates/chat_templates/*.jinja
chat_templates/generation_configs/*.json labels: model_name: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct openllm_alias: 0.5b,0.5b-instruct platforms: linux source: https://github.com/bentoml/openllm-models-feed/tree/main/source/vllm-chat models: [] name: null python: extra_index_url: null find_links: null index_url: null lock_packages: true no_index: null pack_git_packages: true packages: null pip_args: null requirements_txt: ./requirements.txt trusted_host: null wheels: null service: service:VLLM

bento_constants.py 更新如下

CONSTANT_YAML = ''' engine_config: dtype: half max_model_len: 2048 model: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct extra_labels: model_name: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct openllm_alias: 0.5b,0.5b-instruct project: vllm-chat service_config: name: qwen2 resources: gpu: 1 gpu_type: nvidia-rtx-3060 traffic: timeout: 300

'''

bento.yaml 更新如下

service: service:VLLM name: qwen2 version: 0.5b-instruct-fp16-fcc6 bentoml_version: 1.2.20 creation_time: '2024-07-12T14:16:26.873508+00:00' labels: model_name: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct openllm_alias: 0.5b,0.5b-instruct platforms: linux source: https://github.com/bentoml/openllm-models-feed/tree/main/source/vllm-chat models: [] runners: [] entry_service: qwen2 services:

name: qwen2 service: '' models: [] dependencies: [] config: name: qwen2 resources: gpu: 1 gpu_type: nvidia-rtx-3060 traffic: timeout: 300 envs:
name: HF_TOKEN schema: name: qwen2 type: service routes:
- name: chat route: /api/chat batchable: false input: properties: messages: default:
  - role: user content: what is the meaning of life? items: properties: role: enum:
    - system
    - user
    - assistant title: Role type: string content: title: Content type: string required:
      - role
      - content title: Message type: object title: Messages type: array model: default: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct title: Model type: string max_tokens: default: 2048 maximum: 2048 minimum: 128 title: Max Tokens type: integer stop: default: null title: Stop items: type: string type: array title: Input type: object output: title: strIODescriptor type: string is_stream: true media_type: text/event-stream
- name: generate route: /api/generate batchable: false input: properties: prompt: default: Explain superconductors like I'm five years old title: Prompt type: string model: default: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct title: Model type: string max_tokens: default: 2048 maximum: 2048 minimum: 128 title: Max Tokens type: integer stop: default: null title: Stop items: type: string type: array title: Input type: object output: title: strIODescriptor type: string is_stream: true media_type: text/event-stream apis: [] docker: distro: debian python_version: '3.9' cuda_version: null env: HF_TOKEN: '' system_packages: null setup_script: null base_image: null dockerfile_template: null python: requirements_txt: ./requirements.txt packages: null lock_packages: true pack_git_packages: true index_url: null no_index: null trusted_host: null find_links: null extra_index_url: null pip_args: null wheels: null conda: environment_yml: null channels: null dependencies: null pip: null

9.启动venv虚拟环境，运行命令进入/home/tcx/.openllm/repos/github.com/bentoml/openllm-models/main/bentoml/bentos/qwen2/0.5b-instruct-fp16-fcc6/src目录执行命令

$ export BENTOML_HOME=/home/tcx/.openllm/repos/github.com/bentoml/openllm-models/main/bentoml $ source /home/tcx/.openllm/venv/998690274545817638/bin/activate $ bentoml serve qwen2:0.5b-instruct-fp16-fcc6

或者 bentoml serve .

10 如果端口被占用，执行如下命令 netstat -tulnp | grep 3000 sudo kill -9 进程号

dsp6414 commented 1 month ago

bojiang commented 1 month ago

It seems that you have a solution step by step. Anything we can help?

dsp6414 commented 1 month ago

It seems that you have a solution step by step. Anything we can help?

I still do not know how to load a Lora fine-tuning model or where to modify the yaml file.

aarnphm commented 1 month ago

I don't think have loading lora supported yet, but we can add this @bojiang

bojiang commented 1 month ago

As for local path model, I think we can support it.

dsp6414 commented 1 month ago

thanks🌺

dsp6414 commented 1 month ago

openllm-models service.py

vllm_api_server.openai_serving_chat = OpenAIServingChat( engine=self.engine, served_model_names=[ENGINE_CONFIG["model"]], response_role="assistant", chat_template=chat_template, model_config=model_config, lora_modules=None, prompt_adapters=None, request_logger=None, ) vllm_api_server.openai_serving_completion = OpenAIServingCompletion( engine=self.engine, served_model_names=[ENGINE_CONFIG["model"]], model_config=model_config, lora_modules=None, prompt_adapters=None, request_logger=None, )

both set lora_modules=None, how to set my lora model?

dsp6414 commented 1 month ago

🌼

dsp6414 commented 1 month ago

https://zhuanlan.zhihu.com/p/711869222

bentoml / OpenLLM

Can openllm support local path model? #1044