issues
search
flashinfer-ai
/
flashinfer
FlashInfer: Kernel Library for LLM Serving
https://flashinfer.ai
Apache License 2.0
756
stars
62
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
tests: add more unittests for logits cap
#352
yzh119
closed
2 hours ago
0
tests: add more unittests for logits cap
#351
yzh119
closed
3 hours ago
0
hotfix: fix the decode kernel with logits cap
#350
yzh119
closed
3 hours ago
0
flashinfer.page.append_paged_kv_cache will cause an invalid memory access if device != 'cuda:0'
#349
Tomorrowdawn
opened
21 hours ago
1
Fix a bug related to causal mask
#348
rchardx
closed
22 hours ago
0
refactor: use sink symbol instead of a placeholder register in row sum mma implementation
#347
yzh119
closed
1 day ago
0
docs: update README for ScaleLLM
#346
zhyncs
closed
1 day ago
0
ci: remove redundant `NUM_FRAGS_Z`
#345
yzh119
closed
2 days ago
0
ci: update CHANGELOG
#344
yzh119
closed
2 days ago
0
refactor: reduce the binary size of batch decode kernels
#343
yzh119
closed
3 days ago
0
misc: use https for submodule spdlog
#342
yzh119
closed
3 days ago
0
linker: use `mcmodel=medium` and `--no-relax` to compilation flags for large wheels
#341
yzh119
closed
4 days ago
0
[CMake][Bugfix] Set default value for FLASHINFER_GEN_MASK_MODES
#340
Lunderberg
closed
4 days ago
0
feat: customize `logits_soft_cap` value
#339
yzh119
closed
4 days ago
1
benchmark: add batch prefill with ragged kv-cache benchmark
#338
yzh119
closed
4 days ago
0
bugfix: fix the `forward_return_lse` function in `BatchPrefillWithRaggedKVCache` class
#337
yzh119
closed
5 days ago
0
perf: more options for kv tile size
#336
yzh119
closed
6 days ago
0
There are precision errors compared with flash_attn_2_cuda.varlen_fwd
#335
Amanda-Barbara
opened
1 week ago
3
bugfix: fix std::max mismatch in #333
#334
yzh119
closed
1 week ago
0
bugfix: fix the scheduler behavior of large page size
#333
yzh119
closed
1 week ago
0
Why did we perform an operation similar to data alignment here instead of directly adding 4?
#332
luliyucoordinate
closed
1 week ago
2
doc: bugfix on documentation about mask usage
#331
yzh119
closed
1 week ago
0
Sizes of tensors must match except in dimension 0 when creating mask
#330
llx-08
closed
1 week ago
1
perf: change minimal `kv_chunk_size` back to 128
#329
yzh119
closed
1 week ago
0
ci: separate `update_whl_index` from github action files
#328
yzh119
closed
1 week ago
0
chore(main): release 0.0.7
#327
github-actions[bot]
closed
4 days ago
1
fix: disable other warp layout because of large binary size
#326
yzh119
closed
1 week ago
0
Bugfix: bugfix to #322
#325
yzh119
closed
1 week ago
0
chore(main): release 0.0.6
#324
github-actions[bot]
closed
1 week ago
1
How are prefill and decode kernels different?
#323
AgrawalAmey
closed
1 week ago
3
perf: use 1x4 warp layout for small query length
#322
yzh119
closed
1 week ago
0
ci: use python3 for release wheel workflow
#321
yzh119
closed
1 week ago
0
How large the page_size could be?
#320
llx-08
closed
1 week ago
4
ci: fix setuptools version
#319
yzh119
closed
1 week ago
0
doc: bump doc version to v0.0.5
#318
yzh119
closed
1 week ago
0
feat: add `use_tensor_cores` option to decode kernels to accelerate GQA
#317
yzh119
closed
1 week ago
0
Installation fails immediately: ModuleNotFoundError: No module named 'torch'
#316
Brennanzuz
opened
1 week ago
2
bugfix: fix cascade test
#315
yzh119
closed
1 week ago
0
Lacks prebuild whl for PyTorch2.3+cu118
#314
heheda12345
closed
1 week ago
2
perf: slight optimization on merge states
#313
yzh119
opened
2 weeks ago
2
refactor: simplify kernel interface
#312
yzh119
closed
2 weeks ago
0
Feature request: support non-contiguous tensors for attention
#311
Yard1
opened
2 weeks ago
1
perf: split kv-cache for prefill/append kernels
#310
yzh119
closed
1 week ago
0
perf: use cub's native BlockLoad/BlockStore for sampling kernels
#309
yzh119
opened
2 weeks ago
0
perf: use packed bit array for attention mask
#308
yzh119
closed
2 weeks ago
0
refactor: use combined div/mod for write lse
#307
yzh119
closed
2 weeks ago
0
refactor: remove `page_size` from template parameters for prefill kernels
#306
yzh119
closed
2 weeks ago
0
Faster compile/ci
#305
Qubitium
closed
2 weeks ago
0
perf: optimize warp layout for prefill operator
#304
yzh119
closed
1 week ago
0
test: fix fp8 calibration test
#303
yzh119
closed
2 weeks ago
0
Next