Closed AlbertG123 closed 3 months ago
I just find the guide to run " python3 chat.py --model_path {your_path}/glm-4-9b-chat --max_sequence_length 4096 --device GPU" how to run “all tools mode" ?
we need a way to run GLM4 all features on Intel CPU/GPU/NPU.
NO
这个demo只是一个让你跑起来的办法,你需要自己将这载入模型的办法替换掉我们 openai_api_server中,自行修改代码替换掉原始vLLM的代码,然后变成OpenAI API格式测试就行
Feature request / 功能建议
I just find the guide to run " python3 chat.py --model_path {your_path}/glm-4-9b-chat --max_sequence_length 4096 --device GPU" how to run “all tools mode" ?
Motivation / 动机
we need a way to run GLM4 all features on Intel CPU/GPU/NPU.
Your contribution / 您的贡献
NO