-
### Describe the bug
Crash with abort when trying to use AMD graphics card in editor
Model is mistral-7b-instruct-v0.2.Q4_K_M.gguf
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon RX…
-
### What happened?
After https://github.com/ggerganov/llama.cpp/commit/231cff5f6f1c050bcb448a8ac5857533b4c05dc7 I'm getting errors with my app, so I decided to test compiled releases - and unable t…
-
Greetings, everyone. First of all, some specifications here:
1. docker version: nvcr.io/nvidia/tritonserver:24.07-trtllm-python-py3
2. trtllm-build --version: [TensorRT-LLM] TensorRT-LLM version:…
-
### Your current environment
The output of `python collect_env.py`
```text
Collecting environment information...
WARNING 09-21 15:29:13 _custom_ops.py:18] Failed to import from vllm._C with Im…
-
### Your current environment
[root@localhost wangjianqiang]# python -m vllm.entrypoints.openai.api_server --model /root/wangjianqiang/deepseek-moe/deepseek-coder-33b-base/ --tensor-parallel-size 8 …
-
I downloaded the model here and ran it on the fork, but the model is writing a lot of random text instead of answering the question "What is the process number?"
https://huggingface.co/openbmb/Mini…
-
### What is the issue?
The GPU is not used when using
Start log
```
ollama start
2024/09/06 06:40:42 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VI…
-
### Your current environment
vllm docker image: vllm/vllm-openai:latest
### 🐛 Describe the bug
It works for the first time then stops generating responses, as shown below.
ChatCompletion(id='c…
-
### Your current environment
```text
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Debian GNU/Linux 11 (bullseye) (x86…
-
ndroid/mlc4j/../../3rdparty/tvm/src/runtime/relax_vm/paged_kv_cache.cc:2650: Check failed: (args.size() == 22 || args.size() == 23) is false: Invalid number of KV cache constructor args.
2024-09-10 2…