-
### Your current environment
```text
The output of `python collect_env.py`
```
### 🐛 Describe the bug
Machine A800, VLLM 0.5.0, PROMPT=开始, output max tokens = 2048, Temperature sets 0.7
VLLM…
-
I try to apply the triton patch like this:
`pip3 install --extra-index-url https://sasha0552.github.io/vllm-ci/ --force-reinstall triton`
Which shows
```
pip3 install --extra-index-url https…
cduk updated
2 months ago
-
如题,搭建两种部署环境:
1. vllm +qwen 2
2. ollama + qwen2 ,
请问这两种部署方式,使用spring ai 来调用 qwen,能否使用function call?
-
### The model to consider.
Mamba Codestral: https://huggingface.co/mistralai/mamba-codestral-7B-v0.1
Highlights:
- SOTA 7B code model
- theoretically unlimited context length; tested up to 256k
…
-
### Your current environment
The output of `python collect_env.py`
```text
Your output of `python collect_env.py` here
```
### 🐛 Describe the bug
Hello,
On a container env I …
-
我在llmuses/benchmarks 按照格式定义了一个数据集,如何来验证vllm部署的模型?Native 模式没找到传模型地址的地方,OpenCompass模式 自定义的数据集不支持
-
### Your current environment
4xH100.
### Model Input Dumps
_No response_
### 🐛 Describe the bug
When benchmarking the performance of vllm with `benchmark_serving.py`, it will generate different…
-
During testing with the --load-in-low-bit features with the vLLM for CPU example. I noticed the model is not optimized based on this option.
I found that it needs to pass in the load_in_low_bit ar…
-
### Your current environment
问题
### 🐛 Describe the bug
```python
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
import torch
# Initialize the tokenizer
tokeniz…
-
### Issue Description
While using pasta for container networking, and not connected to the internet, podman-run always fails.
Alternatively, using `--network=slirp4netns` or `--network=none` works…