-
/kind bug
**What steps did you take and what happened:**
[A clear and concise description of what the bug is.]
init storage-initializer log output:
```
2024-08-13 01:02:12.899 1 kserve INFO [in…
-
### Your current environment
```text
The output of `python collect_env.py`
Collecting environment information...
PyTorch version: 2.2.1+cu121
Is debug build: False
CUDA used to build PyT…
-
At the moment, inference using a model from Hugging Face is only possible with autrainer models, transforms, loaders, etc. as we do not download any `.py` files.
To support custom models, the corresp…
-
When testing the curl command for the mistral7b model, this url will not work, it is only for internal given the 'Token authentication service not installed'. This appears to be an issue in the case …
-
提问时请尽可能提供如下信息:
### 基本信息
- 你加载的**预训练模型**: [Robert-tiny-clue](https://github.com/CLUEbenchmark/CLUE)
### 问题描述
使用 bert4keras 训练了一个分类模型输出了 save_weights 输出了 ckpt,```model.load_weights``` 预测正常。
…
-
### 🚀 The feature, motivation and pitch
I need a way to specify which gpu exactly should vllm use when multiple gpus are available. Currently, it automatically occupies all available gpus (https://do…
-
用以下方式验证glm4-9b-chat模型的输出,serving端报错
curl --request POST \
--url http://127.0.0.1:8000/v1/chat/completions \
--header 'content-type: application/json' \
--data '{
"model": "glm-4-9…
-
## Willingness to contribute
The MLflow Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature (ei…
-
**Describe the bug**
Mixtral-based models (e.g. Prometheus) don't allow System messages to precede User messages in its template. We merge the first System message with the first User message so mess…
-
### 🚀 The feature, motivation and pitch
Hi, I'm currently working on **deploying vLLM distributed on multi-node in k8s cluster**. I saw that the official documentation provided a link by using [LWS…