-
**What is your question?**
Hello!
I’ve been exploring the Cutlass examples for GEMM and Convolution and noticed the use of double buffering.
https://developer.nvidia.com/blog/cutlass-linear-algebra-…
-
## Protocol Change Proposal
### Summary
Increase CometBFT's throughput by allowing nodes to exchange ordered transactions during the voting steps of the consensus algorithm.
### Problem Def…
-
Floorplan check_setup
--------------------------------------------------------------------------
Warning: There are 148 input ports missing set_input_delay.
Warning: There are 238 output ports miss…
-
## 🚀 Feature
Please add Lookahead Decoding in mlc-llm in C++, we needed it to speedup LLM decoding on **mobile device.**
refers to: https://github.com/hao-ai-lab/LookaheadDecoding
## Motivation
…
-
**Describe the feature request**
Make `third-party-jwt` jwtPolicy token expiration configurable.
**Describe alternatives you've considered**
None because we want to stay with the recommended…
-
### Anything you want to discuss about vllm.
Hi,
I am running some benchmarks on the `vllm.entrypoints.openai.api_server` measuring latency and throughput with different number of concurrent reque…
-
### 🚀 The feature, motivation and pitch
### foreword and motivation
This is a foreword on mutable states and the forward pass.
Compared to history, people are now writing models with more types…
awgr updated
2 months ago
-
### Who is this for, and what problem do they have today?
The default chunk size for Redpanda is 16KB. This means that for every IOP performed by Redpanda, 16 KB of data is written to disk. This de…
-
I've been working on an [OpenAI-compatible REST server](https://github.com/guidance-ai/llgtrt), utilizing TensorRT-LLM but not Triton, similar to `openai_server.py` but in Rust and generally productio…
-
It would be nice if GoodJob could be configured to disable/ignore priority as a job feature. e.g. `config.good_job.enable_active_job_priority = false`. Why?
- For performance reasons: it's not possib…