flashinfer-ai flashinfer issues

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

https://flashinfer.ai

Apache License 2.0

1.46k stars 142 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

bugfix: fix sliding window attention tests of ragged attention api

#633 yzh119 closed 4 hours ago
0
perf: speedup jit compilation of prefill attention kernels

#632 yzh119 closed 4 hours ago
0
misc: remove duplicate norm cuda kernels

#631 yzh119 closed 11 hours ago
0
misc: add page ops to shared-prefix kernel unittest warmup function

#630 yzh119 closed 12 hours ago
0
feat: warmup for jit kernel tests

#629 yzh119 closed 14 hours ago
0
jit: further accelerate compilation by spliting files and multi-threading

#628 yzh119 closed 16 hours ago
0
Allow the cascade kernels to be executed using varying sequence lenghts

#627 nandor closed 16 hours ago
0
CUDA Graph support for prefill kernels with varying `qo_indptr`

#626 nandor opened 2 days ago
0
bugfix: fix append_paged_kv_cache test

#625 QiJune closed 3 days ago
0
bugfix: fix prefill kernel uris for aot compilation

#624 yzh119 closed 3 days ago
0
hotfix: fix aot compilation after #618

#623 yzh119 closed 4 days ago
0
feat: add an option `non_blocking` to plan function

#622 yzh119 closed 4 days ago
0
refactor: rename num_frags to num_mma

#621 yzh119 closed 4 days ago
0
bugfix: fix MLA with new JIT pipeline

#620 yzh119 closed 4 days ago
0
bugfix: fix the rope correctness issue introduced in #609

#619 yzh119 closed 4 days ago
0
perf: accelerate JIT compilation speed

#618 yzh119 closed 4 days ago
1
[Announcements] support FlashInfer nightly

#617 zhyncs opened 5 days ago
0
append_kv_cache's documentation is out of date

#616 reyoung opened 5 days ago
1
The PyTorch API reference for flashinfer.rope.apply_rope_pos_ids appears to contain inaccuracies.

#615 ovowei closed 5 days ago
1
[Question] Why is it necessary to use block.sync at this position?

#614 luliyucoordinate opened 6 days ago
1
Fix compile error of OptionalCUDAGuard and device_of

#613 reyoung closed 5 days ago
1
I cannot find FlexAttention-like api.

#612 BirdChristopher closed 1 week ago
2
misc: add device guard for kernels

#611 jeejeelee closed 1 week ago
2
Fix potential multi-process compile issue

#610 Pzzzzz5142 closed 1 week ago
0
Improve parallelism in RoPE with pos_ids

#609 nandor closed 1 week ago
4
Fix the alignment of o_frag

#608 nandor closed 1 week ago
0
test: add DtypeKV template param in bench_batch_decode

#607 dc3671 closed 1 week ago
0
doc: improve the docstring of `append_paged_kv_cache`

#606 yzh119 closed 1 week ago
0
feat: simplify prefill JIT compilation

#605 yzh119 closed 1 week ago
0
doc: update readme

#604 yzh119 closed 1 week ago
0
doc: update documentation index

#603 yzh119 closed 1 week ago
0
perf: fix prefill kernel performance degradation (step 1)

#602 yzh119 closed 1 week ago
0
hotfix: fix rope tvm wrapper

#601 yzh119 closed 1 week ago
0
BatchDecodeWithPagedKVCache will never run to completion.

#600 Atream closed 1 week ago
2
feat: add `rotary_dim` argument to rope APIs for partial apply rope

#599 yzh119 closed 2 weeks ago
0
hotfix: fix import issue in #597

#598 yzh119 closed 2 weeks ago
1
Fix the type of `paged_kv_cache` in append

#597 nandor closed 2 weeks ago
0
[Question] Overflow risks when batch size and sequence length grows extremely large

#596 rchardx opened 2 weeks ago
1
[Question] very small performance gain for cascade append on GQA

#595 hewr1993 closed 1 week ago
3
misc: refactor cutlass includes

#594 yzh119 closed 2 weeks ago
0
[doc] mock triton import

#593 yzh119 closed 2 weeks ago
0
perf: reduce the read and write of shared memory in the FusedAddRMSNormKernel

#592 Abatom closed 2 weeks ago
6
[Feature Request] Add an argument to control the number of CTAs used in attention APIs

#591 yzh119 opened 2 weeks ago
0
include convert latency in bench_append_paged_kv_cache

#590 abcdabcd987 closed 2 weeks ago
0
bugfix: gemm_sm90 compilation error

#589 abcdabcd987 closed 2 weeks ago
0
perf: fix the performance issue of `append_paged_kv_cache`

#588 yzh119 closed 2 weeks ago
1
Improve the precision of the FusedAddRMSNormKernel function

#587 Abatom closed 2 weeks ago
4
feat: CUDAGraph compatibility of multi-level cascade inference APIs

#586 yzh119 closed 2 weeks ago
0
feat: support cached cos/sin in rope APIs

#585 yzh119 closed 2 weeks ago
0
ci: setup pre-commit

#584 yzh119 closed 2 weeks ago
0