large-kernels Search Results

foundation-model-stack/fms-acceleration #76

Introduce Liger Fused Cross Entropy Kernel to FOAK Plugin

## Description Consider adding additional FusedCrossEntropyLoss kernel to FOAK set of kernels given the additional improvement seen using it in earlier tests (See Background below). Considerati…

achew010 updated 2 days ago

rapidsai/cuvs #287

[FEA] CAGRA-Q support pq_len=16 and pq_len=8

Currently CAGRA supports PQ compression with `pq_len=2` ad `pq_len=4`. A larger compression ratio can be achieved if we allow larger `pq_len` values, e.g. 8 and 16. `pq_len` is a [template paramete…

tfeher updated 1 month ago

void-linux/void-packages #48260

Tracking: packages broken on larger pagesize kernels (like r…

Some programs have issues when running on kernels with larger pagesizes. ### jemalloc One common case is programs (especially rust programs) that use jemalloc. this was fixed in #48194 for the j…

classabbyamp updated 3 weeks ago

ECP-WarpX/impactx #532

Large Kernels: Use `AMREX_NO_INLINE`

In ROCm compilers as of early 2024, the compiler force inlines *everything*. While generally nice, this can be problematic for very large kernels in both compile and runtime, if we actually want to…

ax3l updated 7 months ago

pytorch-labs/gpt-fast #140

Question about large sequence length attention kernels

I really love this project and the accompanying blogpost, so thanks! I've reimplemented some of the inference techniques to speed up an implementation of Whisper that I am using. I had a few questions…

loubbrad updated 5 months ago

pytorch/ao #697

[RFC] Which low bit CUDA kernels should we merge or write?

Here is my understanding of the existing state of things and what I think we should be doing to make our lower-bit kernels more performant at both small and larger batch sizes. I'm making this an RFC …

msaroufim updated 3 weeks ago

NVIDIA/cutlass #1789

[QST] Understanding double buffering in GEMM kernels

**What is your question?** Hello! I’ve been exploring the Cutlass examples for GEMM and Convolution and noticed the use of double buffering. https://developer.nvidia.com/blog/cutlass-linear-algebra-…

phantaurus updated 1 week ago

sys-bio/tellurium #591

Parallelization issues with pocoMC

When employing the pocoMC package for Bayesian Inference runs using tellurium for modeling, we have encountered issues with parallelization. Using multiprocess(ing), we noticed a very big discrepancy …

lisaotten updated 2 days ago

vllm-project/vllm #8127

[Feature]: Support for quantization

### 🚀 The feature, motivation and pitch I propose implementing int8 quantization support for vLLM, focusing initially on the KV cache. This feature will allow users to run larger models or increase b…

CREVIOS updated 2 weeks ago

IMAP-Science-Operations-Center/sds-data-manager #117

SPIKE - SPICE Architecture

## Description We need to create AWS architecture that meets our requirements SPICE processing. ## Requirements - Nail down which kernels will be delivered from MOC (MOC -> POC -> SDC) - Low …

laspsandoval updated 1 week ago

1000+ results for large-kernels

1000+ results
for large-kernels