-
@youkaichao
### Your current environment
My environment:
Name: vllm
Version: 0.4.2+cu117
### 🐛 Describe the bug
I quantified the model(Qwen2_72B) using AWQ myself, when i wanna to set api s…
-
### Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [X] 2. The bug has not been fixed in the latest version.
### Describe the bug
当启动110b的服务时,以下命令长时间没有结果
curl…
-
====================================================
步骤:
安装ollama
进行 ollama serve 和ollama run qwen:0.5b
安装chatchat
更改配置
chatchat-config model --set_model_platforms '[{
"platform_name": "ollama…
-
### Is there an existing issue for this?
- [X] I have searched the existing issues
- [X] I have checked [#657](https://github.com/microsoft/graphrag/issues/657) to validate if my issue is covered …
-
### System Info
A100 80G
accelerate 0.31.0
aiohttp 3.9.5
aiosignal 1.3.1
annotated-types 0.7.0
async-timeout …
-
目前部署了kimi可以 但是qwen换8001,访问失败了
-
### Reminder
- [X] I have read the README and searched the existing issues.
### Reproduction
export USE_MODELSCOPE_HUB=1
# nohup sh ppo_qwen.sh > ppo_qwen.log 2>&1 &
CUDA_VISIBLE_DEVICES=0 py…
-
# Summary
Add support for INT4 and/or UINT4
Refs:
https://intellabs.github.io/distiller/quantization.html
https://developer.nvidia.com/blog/int4-for-ai-inference/
https://arxiv.org/abs/2301.12017…
-
#### Context
I am doing some performance comparison between [llama.cpp](https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md) and vLLM in https://github.com/ggerganov/llama.…
-
`st.experimental_rerun` will be removed after 2024-04-01.
Debug: Handling user request for session state: {'discussion': '', 'rephrased_request': '', 'api_key': '', 'agents': [], 'whiteboard': '', '…