issues
search
flashinfer-ai
/
flashinfer
FlashInfer: Kernel Library for LLM Serving
https://flashinfer.ai
Apache License 2.0
1.1k
stars
98
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
bugfix: Fix the behavior of decode cuda graph wrapper
#291
yzh119
closed
2 months ago
0
bugfix: fix the synchronization issue in distributed operators
#290
yzh119
closed
2 months ago
0
feat: initial support of distributed operators
#289
yzh119
closed
2 months ago
0
[DO NOT MERGE] cudagraph: use cuda dynamic parallelism to dispatch kernels
#288
yzh119
closed
1 month ago
1
cmake: fix DECODE_F8_DTYPES and DECODE_FP8_DTYPES discrepancy
#287
ibsidorenko
closed
2 months ago
2
feat: Separate Q and KV dtypes for decode
#286
Yard1
closed
2 months ago
7
[Q&A] Any palns for different dtypes for Q (query) and KV (kv-cache)?
#285
ibsidorenko
closed
2 months ago
4
bugfix: fix the data type of aligned_alloc in handlers
#283
yzh119
closed
3 months ago
0
feat: add group gemm operators
#282
yzh119
closed
3 months ago
1
bugfix: fix cudagraph-compatible prefill/decode apis
#281
yzh119
closed
3 months ago
0
Add dtype checks for q-kv tensors
#280
Yard1
closed
3 months ago
0
misc: suppress compilation warning of fastdiv
#279
yzh119
closed
3 months ago
0
perf: add fastdiv for uint32_t
#278
yzh119
closed
3 months ago
0
feat: support cuda graph for batched multi-query(prefill/append) attention
#277
yzh119
closed
3 months ago
0
Revert "feat: support cuda graph for batched multi-query(prefill/append) attention"
#276
yzh119
closed
3 months ago
0
feat: support cuda graph for batched multi-query(prefill/append) attention
#275
yzh119
closed
3 months ago
0
hotfix: fix setup.py
#274
yzh119
closed
3 months ago
0
fp8: add calibration scale for decode attention operators
#273
yzh119
closed
3 months ago
2
git: ignore generated directory in documentation
#272
yzh119
closed
3 months ago
0
doc: add some documentation for attention with mask API
#271
yzh119
closed
3 months ago
0
doc: update documentation for mask layout
#270
yzh119
closed
3 months ago
0
3rdparty: add dependency to cutlass and composable kernels
#269
yzh119
closed
3 months ago
0
3rdparty: add mscclpp dependency
#268
yzh119
closed
3 months ago
0
bugfix: avoid potential illegal memory access
#267
yzh119
closed
3 months ago
0
feat: support custom attention mask in prefill/append attention kernels
#266
yzh119
closed
3 months ago
0
bugfix: use `FlagHeads` instead of `SubtractLeft` for cuda 118
#265
yzh119
closed
3 months ago
0
doc: bugfix in kv-layout docs
#264
yzh119
closed
3 months ago
0
doc: update documentation
#263
yzh119
closed
3 months ago
0
[WIP] rafactor: make `gqa_group_size` a function argument instead of template parameter
#262
yzh119
closed
2 months ago
1
build raise "cub::BlockAdjacentDifference<__nv_bool, 1024, 1, 1, 860>" has no member "SubtractLeft"
#261
WanBenLe
closed
3 months ago
8
sampling: fused speculative sampling kernels
#259
yzh119
closed
3 months ago
0
[Bug report] BatchPrefillWithPagedKVCachePyTorchWrapper failed to dispatch group_size 3
#258
merrymercy
closed
2 months ago
3
[Feature request] Support attention logits cap with tanh
#257
merrymercy
closed
2 months ago
5
perf: initial cuda graph support
#256
yzh119
closed
3 months ago
1
bugfix: fix pybind class bindings
#255
yzh119
closed
3 months ago
0
Qwen1.5-32B failed: BatchPrefillWithPagedKVCachePyTorchWrapper failed to dispatch group_size 5
#254
QwertyJack
closed
2 months ago
1
perm: use page-locked host memory for auxiliary data structure on CPU
#253
yzh119
closed
3 months ago
0
cmake: backward compatibility for TVM_HOME
#252
yzh119
closed
3 months ago
0
cmake: rename TVM_HOME to TVM_SOURCE_DIR
#251
yzh119
closed
3 months ago
0
Can BatchDecodeWithPaddedKVCache be used in cascade inference?
#250
joey12300
closed
2 days ago
2
CUDA Error: no kernel image is available for execution on the device (209) /tmp/build-via-sdist-nl8se4dx/flashinfer-0.0.4+cu118torch2.2/include/flashinfer/attention/decode.cuh: line 871 at function cudaFuncSetAttribute(kernel, cudaFuncAttributeMaxDynamicSharedMemorySize, smem_size)
#249
lucasjinreal
opened
3 months ago
2
Circular import error when importing built-from-source flashinfer
#248
vedantroy
opened
3 months ago
1
Fix compile/assert on group_size
#247
Qubitium
closed
2 months ago
1
Add group_size 7 and fix compat with Yi 1.5 34b
#246
Qubitium
closed
3 months ago
3
multiple definition of `cuda::__3::pipeline...
#245
jpf888
opened
3 months ago
0
Move -Wno-switch-bool argument to cxx from nvcc
#244
mgerstgrasser
closed
3 months ago
0
Compilation fails due to "-Wno-switch-bool" nvcc flag
#243
mgerstgrasser
closed
3 months ago
0
能否支持Volta/Tesla架构?
#242
alexngng
closed
3 weeks ago
2
bugfix: Fix dispatcher in src directory
#241
yzh119
closed
3 months ago
0
bugfix: fix the `generate_dispatch_inc` script
#240
yzh119
closed
3 months ago
0
Previous
Next