-
### Your current environment
```text
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Debian GNU/Linux 11 (bullseye) (x86…
-
ndroid/mlc4j/../../3rdparty/tvm/src/runtime/relax_vm/paged_kv_cache.cc:2650: Check failed: (args.size() == 22 || args.size() == 23) is false: Invalid number of KV cache constructor args.
2024-09-10 2…
-
### What is the issue?
When I try `ollama run llama3.1:70b`, occur error `Error: llama runner process has terminated: error loading model: unable to allocate backend buffer`
```
C:\Users\sol>olla…
-
————————env lib detail:——————————————
inf2.24xlarge
ubuntu@ip-172-31-12-212:~/vllm$ pip list|grep -i neuron
aws-neuronx-runtime-discovery 2.9
libneuronxla 2.0.755
neuro…
-
### What is the issue?
I encountered an error while attempting to run both q8_0 and q4_k_m.
`Error: llama runner process has terminated: error loading model: error loading model vocabulary: wstrin…
-
### Your current environment
```text
PyTorch version: 2.1.2+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.4 LTS (x86_64)
GCC …
-
### Your current environment
```text
The output of `python collect_env.py`
WARNING 11-10 22:34:11 cuda.py:76] Detected different devices in the system:
WARNING 11-10 22:34:11 cuda.py:76] NVIDIA G…
-
# Prerequisites
ROCm 6
# Expected Behavior
Attempting to utitilize llama_cpp_python in OobaBooga Webui
# Current Behavior
It loads the model into VRAM. Then upon trying to infer;
gml…
-
### What is the issue?
I have pulled a couple of LLMs via Ollama. When I run any LLM, the response is very slow – so much so that I can type faster than the responses I am getting.
My system speci…
-
### #
- [ ] I have searched the existing issues
### Current behavior
error log below
btw. same model and same mmproject-file works with koboldcpp , may you can copy paste ;)
### Minimum repro…