-
### Motivation.
OpenVINO is open source solution for inference deep learning models, including LLMs. OpenVINO supports both Intel and ARM CPUs, Intel integrated and discrete GPUs, NPU and has a goo…
-
**Describe the feature**
Typically OpenAI API's are protected with some API_KEY, so to run requests to such API's it is necessary to specify API_KEY in a call.
Is it planned to add such functionalit…
vlsav updated
23 hours ago
-
Hi,
I realize that this is a big ask but I am learning more and more about inferencing and I've heard that VLLM tends to have better performance for many GPU training.
OLLAMA is a great UX and I…
-
### Your current environment
```text
The output of `python collect_env.py`
```
### 🐛 Describe the bug
python3 -m vllm.entrypoints.openai.api_server --model /model/models/gemma-2-27b-it/ --dtyp…
-
When running a Qwen1.5 model, it loads but have this error when serving:
```
handle:
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/vllm-0.4.2+cpu-py3.11-lin…
-
### The model to consider.
https://huggingface.co/microsoft/Phi-3-medium-128k-instruct
I was trying to run the exl2 quants for these models , but getting error at rotatry embedding these models us…
-
Hi. Thanks for sharing great works!
I wonder what is the role of `scale_watershed` in https://github.com/Alpha-VLLM/Lumina-T2X/blob/7bc7d7d70a20a262b4f04e873497f58f722aa224/lumina_next_t2i/models/m…
-
We already have a brief description about this proposed feature in the vLLM issue (https://github.com/vllm-project/vllm/issues/3563), but we still need a more detailed design document:
* Value prop…
-
设备信息:8*A800 80G
启动命令如下
```bash
nohup python -m vllm.entrypoints.openai.api_server \
--served-model-name Qwen2-57B-A14B-Instruct \
--model /media/user/data_one/nlp_model/Qwen2-57B-A14B-I…
-
### Motivation
The `min_p` sampling parameter is becoming quite popular. It's conceptually simple and "makes sense", and (at least anecdotally, according to opinions of many model fine-tuners and u…