issues
search
flashinfer-ai
/
flashinfer
FlashInfer: Kernel Library for LLM Serving
https://flashinfer.ai
Apache License 2.0
760
stars
64
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
perf: split kv-cache for prefill/append kernels
#310
yzh119
closed
2 weeks ago
0
perf: use cub's native BlockLoad/BlockStore for sampling kernels
#309
yzh119
opened
2 weeks ago
0
perf: use packed bit array for attention mask
#308
yzh119
closed
2 weeks ago
0
refactor: use combined div/mod for write lse
#307
yzh119
closed
2 weeks ago
0
refactor: remove `page_size` from template parameters for prefill kernels
#306
yzh119
closed
2 weeks ago
0
Faster compile/ci
#305
Qubitium
closed
2 weeks ago
0
perf: optimize warp layout for prefill operator
#304
yzh119
closed
2 weeks ago
0
test: fix fp8 calibration test
#303
yzh119
closed
2 weeks ago
0
test: fix unittest for group gemm
#302
yzh119
closed
2 weeks ago
0
rafactor: move `gqa_group_size` from template parameter to input arguments
#301
yzh119
closed
2 weeks ago
0
doc: fix logits cap docstring
#300
yzh119
closed
3 weeks ago
0
doc: fix the description of logits cap in docstring
#299
yzh119
closed
3 weeks ago
0
feat: initial support of logits hook
#298
yzh119
closed
3 weeks ago
0
bugfix: suppress alignment warning of sampling kernels
#297
yzh119
closed
3 weeks ago
0
bugfix: fix wrong `padded_batch_size_`
#296
yzh119
closed
3 weeks ago
0
[Q&A] Cutlass and contributing
#295
jeromeku
closed
3 weeks ago
3
refactor: refactor decode handler
#294
yzh119
closed
3 weeks ago
0
misc: add some notes in `cmake.config`
#293
yzh119
closed
3 weeks ago
0
doc: fix the math display of group gemm operator
#292
yzh119
closed
3 weeks ago
0
bugfix: Fix the behavior of decode cuda graph wrapper
#291
yzh119
closed
3 weeks ago
0
bugfix: fix the synchronization issue in distributed operators
#290
yzh119
closed
3 weeks ago
0
feat: initial support of distributed operators
#289
yzh119
closed
3 weeks ago
0
[DO NOT MERGE] cudagraph: use cuda dynamic parallelism to dispatch kernels
#288
yzh119
opened
3 weeks ago
1
cmake: fix DECODE_F8_DTYPES and DECODE_FP8_DTYPES discrepancy
#287
ibsidorenko
closed
4 weeks ago
2
feat: Separate Q and KV dtypes for decode
#286
Yard1
closed
3 weeks ago
7
[Q&A] Any palns for different dtypes for Q (query) and KV (kv-cache)?
#285
ibsidorenko
closed
3 weeks ago
4
bugfix: fix the data type of aligned_alloc in handlers
#283
yzh119
closed
1 month ago
0
feat: add group gemm operators
#282
yzh119
closed
1 month ago
1
bugfix: fix cudagraph-compatible prefill/decode apis
#281
yzh119
closed
1 month ago
0
Add dtype checks for q-kv tensors
#280
Yard1
closed
1 month ago
0
misc: suppress compilation warning of fastdiv
#279
yzh119
closed
1 month ago
0
perf: add fastdiv for uint32_t
#278
yzh119
closed
1 month ago
0
feat: support cuda graph for batched multi-query(prefill/append) attention
#277
yzh119
closed
1 month ago
0
Revert "feat: support cuda graph for batched multi-query(prefill/append) attention"
#276
yzh119
closed
1 month ago
0
feat: support cuda graph for batched multi-query(prefill/append) attention
#275
yzh119
closed
1 month ago
0
hotfix: fix setup.py
#274
yzh119
closed
1 month ago
0
fp8: add calibration scale for decode attention operators
#273
yzh119
closed
1 month ago
2
git: ignore generated directory in documentation
#272
yzh119
closed
1 month ago
0
doc: add some documentation for attention with mask API
#271
yzh119
closed
1 month ago
0
doc: update documentation for mask layout
#270
yzh119
closed
1 month ago
0
3rdparty: add dependency to cutlass and composable kernels
#269
yzh119
closed
1 month ago
0
3rdparty: add mscclpp dependency
#268
yzh119
closed
1 month ago
0
bugfix: avoid potential illegal memory access
#267
yzh119
closed
1 month ago
0
feat: support custom attention mask in prefill/append attention kernels
#266
yzh119
closed
1 month ago
0
bugfix: use `FlagHeads` instead of `SubtractLeft` for cuda 118
#265
yzh119
closed
1 month ago
0
doc: bugfix in kv-layout docs
#264
yzh119
closed
1 month ago
0
doc: update documentation
#263
yzh119
closed
1 month ago
0
[WIP] rafactor: make `gqa_group_size` a function argument instead of template parameter
#262
yzh119
closed
2 weeks ago
1
build raise "cub::BlockAdjacentDifference<__nv_bool, 1024, 1, 1, 860>" has no member "SubtractLeft"
#261
WanBenLe
closed
1 month ago
8
sampling: fused speculative sampling kernels
#259
yzh119
closed
1 month ago
0
Previous
Next