-
**Question**
Hello, it is very nice to see this interesting work.
However, when trying to implement results in Table 4, the throughput using ShadowKV is much lower than that reported.
As shown in t…
-
### What happened?
When trying to run [FatLlama-1.7T-Instruct](https://huggingface.co/RichardErkhov/FATLLAMA-1.7T-Instruct) llama.cpp crashes while loading the model with the error: `n > N_MAX: 525…
-
### What is the issue?
I had ollama compiled from source and it worked fine. Recently I rebuild it to the last version, and it seems to not use my GPU anymore (it uses a lot of CPU processes, and it …
-
### What happened?
I can't use docker + SYCL when using -ngl >0
With -ngl 0 it's ok
message error :
No kernel named _ZTSZZL17rms_norm_f32_syclPKfPfiifPN4sycl3_V15queueEiENKUlRNS3_7handlerEE0_c…
-
### System Info
- CPU: x86_64, Intel(R) Xeon(R) Platinum 8470
- CPU/Host memory size: 1TB
- GPU:
4xH100 96GB
- Libraries
TensorRT-LLM: main, 0.15.0 (commit: b7868dd1bd1186840e3755b97ea3d3a73dd…
-
I have a 3090 with 24GB VRAM. i tried to geneate with 2 images and i get this error
Traceback (most recent call last):
File "F:\OmniGen\venv\lib\site-packages\gradio\queueing.py", line 624, in p…
-
#### System information
#### System information
Erigon version: `3.00.0-alpha5-78f3647d`
OS & Version: Ubuntu 22.04.4 LTS x86_64
Commit hash: [`78f3647d`](https://github.com/erigontech/eri…
-
**Describe the bug**
A clear and concise description of what the bug is.
When I use Get token from vault the default path secret/vaultPass/ Is used instead of the one specified in the KV Store f…
-
**Describe the bug**
Following [readme](https://github.com/vllm-project/llm-compressor/blob/main/examples/quantization_kv_cache/README.md) here I cannot get a fp8 weight activation and kv cache quant…
-
### Your current environment
I get the following warning running a quantized version of `gemma2`, when I have not quantized the kv cache:
```bash
WARNING 08-11 22:31:50 gemma2.py:399] Some weig…