-
### What happened?
`GGML_ASSERT: D:\a\llama.cpp\llama.cpp\ggml.c:12853: ne2 == ne02`
### Name and Version
```
version: 2965 (03d8900e)
built with MSVC 19.39.33523.0 for x64
```
### What operati…
-
### System Info
-CPU: x86
- Memory: over 300G
- GPU: 8 x V100
- No IB, No nvlink, NCCL use socket for communication
Driver:
```
+--------------------------------------------------------------…
-
### System Info
tensorrt-llm version 0.11.0.dev2024062500
Architecture: x86_64
AMD EPYC 9354 32-Core Processor
``` txt
+----------------------------------------------------------…
-
### System Info
CPU Architecture: x86_64
CPU/Host memory size: 1024Gi (1.0Ti)
GPU properties:
GPU name: NVIDIA GeForce RTX 4090
GPU mem size: 24Gb…
-
The gpu memory usage continues to increase after each round while finetuning LLM with an adapter. The gpu memory increment after each round was approximately the same. I speculate it's because that th…
-
### What is the issue?
when running deepseek-coder-v2:16b on NVIDIA GeForce RTX 3080 Laptop GPU, I have this crash report:
```
Error: llama runner process has terminated: signal: aborted (core dump…
-
### Your current environment
```text
The output of `python collect_env.py`
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
…
-
Version affected: current version v0.7.1 (main)
I initially assumed the issue was with my system, outdated nvidia drivers, cuda etc. But after trying on 4 separate machines running different mixes …
-
### What is the issue?
~$ nvidia-smi
Fri May 24 09:41:47 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04 …
-
### What happened?
I am using Llama.cpp + SYCL to perform inference on a multiple GPU server. However, I get a Segmentation Fault when using multiple GPUs. The same model can produce inference output…