-
### Your current environment
Using official Docker image.
### 🐛 Describe the bug
Using Docker image: vllm/vllm-openai:latest
Params:
```
--model=mistralai/Mistral-7B-Instruct-v0.3
--gpu-memo…
-
This paper has a method similar to speculative sampling that improves models by sampling the lower quality model for tokens to avoid thus increasing the quality of the output of the higher quality mod…
-
This might be of interest :
https://huggingface.co/papers/2402.11131
-
## 🚀 Feature
Add Attention Sinks (https://arxiv.org/pdf/2309.17453.pdf, https://github.com/tomaarsen/attention_sinks/) to MLC.
## Motivation
mlc_chat_cli gets noticeably slower as the conversatio…
-
I ran into a series of issues trying to get VLLM stood up on a system with multiple MI210s. I figured I'd document my issues and workarounds so that someone could pick up the baton later, or at least …
-
### 提交前必须检查以下项目 | The following items must be checked before submission
- [X] 请确保使用的是仓库最新代码(git pull),一些问题已被解决和修复。 | Make sure you are using the latest code from the repository (git pull), some issue…
-
### Before submitting a bug report
- [X] I updated to the latest version of Multi-Account Container and tested if I can reproduce the issue
- [X] I searched for existing reports to see if it hasn't a…
-
### Your current environment
vllm version: 0.4.2
```
CUDA_VISIBLE_DEVICES=6 python -m vllm.entrypoints.openai.api_server \
> --model mistralai/Mistral-7B-Instruct-v0.3 \
> --dtype au…
-
### System Info / 系統信息
Cuda:12.5
python:3.9
ubuntu22.04
### Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
- [ ] docker / docker
- [X] pip install / 通过 pip install 安装
- [ ] instal…
-
run with python -m vllm.entrypoints.openai.api_server --model vicuna-7b-v1.5 --trust-remote-code
curl http://localhost:8000/generate -d '{"prompt": "Below is an instruction that describes a
ta…