latency-constrained Search Results

NVIDIA/cutlass #1789

[QST] Understanding double buffering in GEMM kernels

**What is your question?** Hello! I’ve been exploring the Cutlass examples for GEMM and Convolution and noticed the use of double buffering. https://developer.nvidia.com/blog/cutlass-linear-algebra-…

phantaurus updated 2 weeks ago

cometbft/cometbft #4222

Enhancing CometBFT Performance: Utilizing Votes to Append Tr…

## Protocol Change Proposal ### Summary Increase CometBFT's throughput by allowing nodes to exchange ordered transactions during the voting steps of the consensus algorithm. ### Problem Def…

nenadmilosevic95 updated 4 days ago

The-OpenROAD-Project/megaboom #160

Floorplan check_setup complains about unconstrained pins

Floorplan check_setup -------------------------------------------------------------------------- Warning: There are 148 input ports missing set_input_delay. Warning: There are 238 output ports miss…

jeffng-or updated 1 week ago

mlc-ai/mlc-llm #2769

[Feature Request] Lookahead Decoding support

## 🚀 Feature Please add Lookahead Decoding in mlc-llm in C++, we needed it to speedup LLM decoding on **mobile device.** refers to: https://github.com/hao-ai-lab/LookaheadDecoding ## Motivation …

MrRace updated 2 months ago

istio/istio #52108

Make `third-party-jwt` jwtPolicy token expiration configurab…

**Describe the feature request** Make `third-party-jwt` jwtPolicy token expiration configurable. **Describe alternatives you've considered** None because we want to stay with the recommended…

sac-outsystems updated 1 week ago

vllm-project/vllm #3567

[Misc]: Throughput/Latency for guided_json with ~100% GPU ca…

### Anything you want to discuss about vllm. Hi, I am running some benchmarks on the `vllm.entrypoints.openai.api_server` measuring latency and throughput with different number of concurrent reque…

jens-create updated 2 days ago

pytorch/executorch #4740

Support for dynamic caches

### 🚀 The feature, motivation and pitch ### foreword and motivation This is a foreword on mutable states and the forward pass. Compared to history, people are now writing models with more types…

awgr updated 2 months ago

redpanda-data/redpanda #3776

Make it possible to set `append_chunk_size` on a per topic b…

### Who is this for, and what problem do they have today? The default chunk size for Redpanda is 16KB. This means that for every IOP performed by Redpanda, 16 KB of data is written to disk. This de…

rkruze updated 6 months ago

NVIDIA/TensorRT-LLM #2365

fast-forward tokens in logits post processor

I've been working on an [OpenAI-compatible REST server](https://github.com/guidance-ai/llgtrt), utilizing TensorRT-LLM but not Triton, similar to `openai_server.py` but in Rust and generally productio…

mmoskal updated 4 days ago

bensheldon/good_job #1065

Allow priority to be globally disabled/ignored

It would be nice if GoodJob could be configured to disable/ignore priority as a job feature. e.g. `config.good_job.enable_active_job_priority = false`. Why? - For performance reasons: it's not possib…

bensheldon updated 1 week ago

1000+ results for latency-constrained

1000+ results
for latency-constrained