-
### System Info
- Hardware: 8x NVIDIA H100 80GB HBM3
- Software: NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.4
- tensorrtllm_backend commit: [d173386f4dd7b3ed5…
-
Run vllm serving test on ARC with below issue:
NFO 07-04 19:10:08 async_llm_engine.py:152] Aborted request cmpl-e5fb5cad96e9402dabbbece3611ae22f-0.
INFO: 127.0.0.1:41772 - "POST /v1/completions …
-
As discussed on Discord, we need to know what prompts you are serving the evaluation LLM.
https://hamel.dev/blog/posts/prompt/
I need to see the prompt to help debug when the framework fails or …
-
### Your current environment
docker image: vllm/vllm-openai:0.4.2
Model: https://huggingface.co/alpindale/c4ai-command-r-plus-GPTQ
GPUs: RTX8000 * 2
### 🐛 Describe the bug
The model works f…
-
Hello, I'm using 24.03-trtllm-python-py3 with image size 8.38 GB which is not small but ok.
I'm going to migrate to the newest versions like 24.04 or 24.05 but it size drastically increased to 18.46 …
-
Hey,
Currently, Ollama is saving models locally on a cache. To maintain different versions of LLMs or finetuned ones and also for extensive monitoring it's a good idea to provide integration with M…
-
### Your current environment
```text
PyTorch version: 2.3.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.4 LTS (x86_64)
GCC …
-
Are you planning to integrate the vLLM package for fast LLM inference and serving?
https://vllm.readthedocs.io/en/latest/
-
**Describe the bug**
gemma-2-9b-it-gptq-4bit CUDA OOM on RTX 3090
**GPU Info**
```
Sun Aug 4 02:35:35 2024
+-----------------------------------------------------------------------…
-
### win11专业版下安装wsl2,wsl下安装docker desktop,在镜像中运行pytorch代码报错
启动镜像命令
```
docker run -itd --privileged --device=/dev/dri -v /c//models:/llm/models -v /usr/lib/wsl:/usr/lib/wsl --name=arc_vllm --s…