issues
search
flashinfer-ai
/
flashinfer
FlashInfer: Kernel Library for LLM Serving
https://flashinfer.ai
Apache License 2.0
1.46k
stars
142
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
bugfix: fix sliding window attention tests of ragged attention api
#633
yzh119
closed
4 hours ago
0
perf: speedup jit compilation of prefill attention kernels
#632
yzh119
closed
4 hours ago
0
misc: remove duplicate norm cuda kernels
#631
yzh119
closed
11 hours ago
0
misc: add page ops to shared-prefix kernel unittest warmup function
#630
yzh119
closed
12 hours ago
0
feat: warmup for jit kernel tests
#629
yzh119
closed
14 hours ago
0
jit: further accelerate compilation by spliting files and multi-threading
#628
yzh119
closed
16 hours ago
0
Allow the cascade kernels to be executed using varying sequence lenghts
#627
nandor
closed
16 hours ago
0
CUDA Graph support for prefill kernels with varying `qo_indptr`
#626
nandor
opened
2 days ago
0
bugfix: fix append_paged_kv_cache test
#625
QiJune
closed
3 days ago
0
bugfix: fix prefill kernel uris for aot compilation
#624
yzh119
closed
3 days ago
0
hotfix: fix aot compilation after #618
#623
yzh119
closed
4 days ago
0
feat: add an option `non_blocking` to plan function
#622
yzh119
closed
4 days ago
0
refactor: rename num_frags to num_mma
#621
yzh119
closed
4 days ago
0
bugfix: fix MLA with new JIT pipeline
#620
yzh119
closed
4 days ago
0
bugfix: fix the rope correctness issue introduced in #609
#619
yzh119
closed
4 days ago
0
perf: accelerate JIT compilation speed
#618
yzh119
closed
4 days ago
1
[Announcements] support FlashInfer nightly
#617
zhyncs
opened
5 days ago
0
append_kv_cache's documentation is out of date
#616
reyoung
opened
5 days ago
1
The PyTorch API reference for flashinfer.rope.apply_rope_pos_ids appears to contain inaccuracies.
#615
ovowei
closed
5 days ago
1
[Question] Why is it necessary to use block.sync at this position?
#614
luliyucoordinate
opened
6 days ago
1
Fix compile error of OptionalCUDAGuard and device_of
#613
reyoung
closed
5 days ago
1
I cannot find FlexAttention-like api.
#612
BirdChristopher
closed
1 week ago
2
misc: add device guard for kernels
#611
jeejeelee
closed
1 week ago
2
Fix potential multi-process compile issue
#610
Pzzzzz5142
closed
1 week ago
0
Improve parallelism in RoPE with pos_ids
#609
nandor
closed
1 week ago
4
Fix the alignment of o_frag
#608
nandor
closed
1 week ago
0
test: add DtypeKV template param in bench_batch_decode
#607
dc3671
closed
1 week ago
0
doc: improve the docstring of `append_paged_kv_cache`
#606
yzh119
closed
1 week ago
0
feat: simplify prefill JIT compilation
#605
yzh119
closed
1 week ago
0
doc: update readme
#604
yzh119
closed
1 week ago
0
doc: update documentation index
#603
yzh119
closed
1 week ago
0
perf: fix prefill kernel performance degradation (step 1)
#602
yzh119
closed
1 week ago
0
hotfix: fix rope tvm wrapper
#601
yzh119
closed
1 week ago
0
BatchDecodeWithPagedKVCache will never run to completion.
#600
Atream
closed
1 week ago
2
feat: add `rotary_dim` argument to rope APIs for partial apply rope
#599
yzh119
closed
2 weeks ago
0
hotfix: fix import issue in #597
#598
yzh119
closed
2 weeks ago
1
Fix the type of `paged_kv_cache` in append
#597
nandor
closed
2 weeks ago
0
[Question] Overflow risks when batch size and sequence length grows extremely large
#596
rchardx
opened
2 weeks ago
1
[Question] very small performance gain for cascade append on GQA
#595
hewr1993
closed
1 week ago
3
misc: refactor cutlass includes
#594
yzh119
closed
2 weeks ago
0
[doc] mock triton import
#593
yzh119
closed
2 weeks ago
0
perf: reduce the read and write of shared memory in the FusedAddRMSNormKernel
#592
Abatom
closed
2 weeks ago
6
[Feature Request] Add an argument to control the number of CTAs used in attention APIs
#591
yzh119
opened
2 weeks ago
0
include convert latency in bench_append_paged_kv_cache
#590
abcdabcd987
closed
2 weeks ago
0
bugfix: gemm_sm90 compilation error
#589
abcdabcd987
closed
2 weeks ago
0
perf: fix the performance issue of `append_paged_kv_cache`
#588
yzh119
closed
2 weeks ago
1
Improve the precision of the FusedAddRMSNormKernel function
#587
Abatom
closed
2 weeks ago
4
feat: CUDAGraph compatibility of multi-level cascade inference APIs
#586
yzh119
closed
2 weeks ago
0
feat: support cached cos/sin in rope APIs
#585
yzh119
closed
2 weeks ago
0
ci: setup pre-commit
#584
yzh119
closed
2 weeks ago
0
Next