-
-
The [server](https://github.com/ggerganov/llama.cpp/tree/master/examples/server) example has been growing in functionality and unfortunately I feel it is not very stable at the moment and there are so…
-
### 是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
- [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions
### 该问题是否在FAQ中有解答? | Is there an existing ans…
-
This might be of interest :
https://huggingface.co/papers/2402.11131
-
Howdy. It seems when running a vLLM server and then attempting to interact with it via `HFClientVLLM`, I get an error message. Here is how to reproduce:
```bash
# Computer 1
pip install ray==2.20…
-
### 是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
- [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions
### 该问题是否在FAQ中有解答? | Is there an existing ans…
-
at @onefact we have been using wasm, but this won't work for the encoder-only or encoder-decoder models i've built (e.g. http://arxiv.org/abs/1904.05342). that's because the wasm vm is for the cpu (ha…
-
### Your current environment
Using official Docker image.
### 🐛 Describe the bug
Using Docker image: vllm/vllm-openai:latest
Params:
```
--model=mistralai/Mistral-7B-Instruct-v0.3
--gpu-memo…
-
I ran into a series of issues trying to get VLLM stood up on a system with multiple MI210s. I figured I'd document my issues and workarounds so that someone could pick up the baton later, or at least …
-
### Your current environment
```text
The output of `python collect_env.py`
```
```
Collecting environment information...
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build P…