-
This issue is the "disk read" counterpart to https://github.com/cockroachdb/cockroach/issues/17500, which was addressed by https://github.com/etcd-io/raft/pull/8 and https://github.com/cockroachdb/coc…
-
Hello, I would like to know if the inference times reported in Figure 4 are measured under NO KV cache? While the "TPS" results in Table 3 are prefill time (first token inference time)?
-
To make Restate fully highly available, we also need the metadata store to be highly available. We either find a kv store that provides us with linearizable reads and writes or we need to build it our…
-
This is my driver:
![image](https://github.com/user-attachments/assets/582c828f-19f6-431c-9ad1-215ea07b9cbd)
I run whisper.net\examples\NvidiaCuda project,and input 00:01:48 duration with 11.7mb s…
-
### Your current environment
(latest docker image `vllm/vllm-openai:latest`)
```text
root@68ac2e4db323:/vllm-workspace# python3 collect_env.py
Collecting environment information...
PyTorch versi…
jphme updated
1 month ago
-
In the ```update_kv``` function of ```H2OKVCluster``` class, I see this code.
```
attn_weights = torch.matmul(query_states[..., -self.window_size:, :], key_states.transpose(2, 3)) / math.sqrt(head…
-
**Is your feature request related to a problem? Please describe.**
Many of the students including me faces to learn the roadmap of android developer.
**Describe the solution you'd like**
I will c…
-
In https://github.com/FuelLabs/fuel-core/pull/2142 we introduced benchmarks that were not too clean, and part of the rework has been addressed in -
- https://github.com/FuelLabs/fuel-core/pull/2168…
rymnc updated
2 months ago
-
I tried two gguf conversion on M2 ultra (metal) but no luck. I converted them myself and still the same error.
Here is the first model I tried:
https://huggingface.co/guinmoon/MobileVLM-1.7B-GGUF…
-
### What is the issue?
When using the llm benchmark with ollama https://github.com/MinhNgyuen/llm-benchmark , I get around 80 t/s with gemma 2 2b. When asking the same questions to llama.cpp in conve…