flashinfer-ai flashinfer issues

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

https://flashinfer.ai

Apache License 2.0

760 stars 64 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

perf: split kv-cache for prefill/append kernels

#310 yzh119 closed 2 weeks ago
0
perf: use cub's native BlockLoad/BlockStore for sampling kernels

#309 yzh119 opened 2 weeks ago
0
perf: use packed bit array for attention mask

#308 yzh119 closed 2 weeks ago
0
refactor: use combined div/mod for write lse

#307 yzh119 closed 2 weeks ago
0
refactor: remove `page_size` from template parameters for prefill kernels

#306 yzh119 closed 2 weeks ago
0
Faster compile/ci

#305 Qubitium closed 2 weeks ago
0
perf: optimize warp layout for prefill operator

#304 yzh119 closed 2 weeks ago
0
test: fix fp8 calibration test

#303 yzh119 closed 2 weeks ago
0
test: fix unittest for group gemm

#302 yzh119 closed 2 weeks ago
0
rafactor: move `gqa_group_size` from template parameter to input arguments

#301 yzh119 closed 2 weeks ago
0
doc: fix logits cap docstring

#300 yzh119 closed 3 weeks ago
0
doc: fix the description of logits cap in docstring

#299 yzh119 closed 3 weeks ago
0
feat: initial support of logits hook

#298 yzh119 closed 3 weeks ago
0
bugfix: suppress alignment warning of sampling kernels

#297 yzh119 closed 3 weeks ago
0
bugfix: fix wrong `padded_batch_size_`

#296 yzh119 closed 3 weeks ago
0
[Q&A] Cutlass and contributing

#295 jeromeku closed 3 weeks ago
3
refactor: refactor decode handler

#294 yzh119 closed 3 weeks ago
0
misc: add some notes in `cmake.config`

#293 yzh119 closed 3 weeks ago
0
doc: fix the math display of group gemm operator

#292 yzh119 closed 3 weeks ago
0
bugfix: Fix the behavior of decode cuda graph wrapper

#291 yzh119 closed 3 weeks ago
0
bugfix: fix the synchronization issue in distributed operators

#290 yzh119 closed 3 weeks ago
0
feat: initial support of distributed operators

#289 yzh119 closed 3 weeks ago
0
[DO NOT MERGE] cudagraph: use cuda dynamic parallelism to dispatch kernels

#288 yzh119 opened 3 weeks ago
1
cmake: fix DECODE_F8_DTYPES and DECODE_FP8_DTYPES discrepancy

#287 ibsidorenko closed 4 weeks ago
2
feat: Separate Q and KV dtypes for decode

#286 Yard1 closed 3 weeks ago
7
[Q&A] Any palns for different dtypes for Q (query) and KV (kv-cache)?

#285 ibsidorenko closed 3 weeks ago
4
bugfix: fix the data type of aligned_alloc in handlers

#283 yzh119 closed 1 month ago
0
feat: add group gemm operators

#282 yzh119 closed 1 month ago
1
bugfix: fix cudagraph-compatible prefill/decode apis

#281 yzh119 closed 1 month ago
0
Add dtype checks for q-kv tensors

#280 Yard1 closed 1 month ago
0
misc: suppress compilation warning of fastdiv

#279 yzh119 closed 1 month ago
0
perf: add fastdiv for uint32_t

#278 yzh119 closed 1 month ago
0
feat: support cuda graph for batched multi-query(prefill/append) attention

#277 yzh119 closed 1 month ago
0
Revert "feat: support cuda graph for batched multi-query(prefill/append) attention"

#276 yzh119 closed 1 month ago
0
feat: support cuda graph for batched multi-query(prefill/append) attention

#275 yzh119 closed 1 month ago
0
hotfix: fix setup.py

#274 yzh119 closed 1 month ago
0
fp8: add calibration scale for decode attention operators

#273 yzh119 closed 1 month ago
2
git: ignore generated directory in documentation

#272 yzh119 closed 1 month ago
0
doc: add some documentation for attention with mask API

#271 yzh119 closed 1 month ago
0
doc: update documentation for mask layout

#270 yzh119 closed 1 month ago
0
3rdparty: add dependency to cutlass and composable kernels

#269 yzh119 closed 1 month ago
0
3rdparty: add mscclpp dependency

#268 yzh119 closed 1 month ago
0
bugfix: avoid potential illegal memory access

#267 yzh119 closed 1 month ago
0
feat: support custom attention mask in prefill/append attention kernels

#266 yzh119 closed 1 month ago
0
bugfix: use `FlagHeads` instead of `SubtractLeft` for cuda 118

#265 yzh119 closed 1 month ago
0
doc: bugfix in kv-layout docs

#264 yzh119 closed 1 month ago
0
doc: update documentation

#263 yzh119 closed 1 month ago
0
[WIP] rafactor: make `gqa_group_size` a function argument instead of template parameter

#262 yzh119 closed 2 weeks ago
1
build raise "cub::BlockAdjacentDifference<__nv_bool, 1024, 1, 1, 860>" has no member "SubtractLeft"

#261 WanBenLe closed 1 month ago
8
sampling: fused speculative sampling kernels

#259 yzh119 closed 1 month ago
0

Previous Next