flashinfer-ai flashinfer issues

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

https://flashinfer.ai

Apache License 2.0

1.1k stars 98 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

bugfix: Fix the behavior of decode cuda graph wrapper

#291 yzh119 closed 2 months ago
0
bugfix: fix the synchronization issue in distributed operators

#290 yzh119 closed 2 months ago
0
feat: initial support of distributed operators

#289 yzh119 closed 2 months ago
0
[DO NOT MERGE] cudagraph: use cuda dynamic parallelism to dispatch kernels

#288 yzh119 closed 1 month ago
1
cmake: fix DECODE_F8_DTYPES and DECODE_FP8_DTYPES discrepancy

#287 ibsidorenko closed 2 months ago
2
feat: Separate Q and KV dtypes for decode

#286 Yard1 closed 2 months ago
7
[Q&A] Any palns for different dtypes for Q (query) and KV (kv-cache)?

#285 ibsidorenko closed 2 months ago
4
bugfix: fix the data type of aligned_alloc in handlers

#283 yzh119 closed 3 months ago
0
feat: add group gemm operators

#282 yzh119 closed 3 months ago
1
bugfix: fix cudagraph-compatible prefill/decode apis

#281 yzh119 closed 3 months ago
0
Add dtype checks for q-kv tensors

#280 Yard1 closed 3 months ago
0
misc: suppress compilation warning of fastdiv

#279 yzh119 closed 3 months ago
0
perf: add fastdiv for uint32_t

#278 yzh119 closed 3 months ago
0
feat: support cuda graph for batched multi-query(prefill/append) attention

#277 yzh119 closed 3 months ago
0
Revert "feat: support cuda graph for batched multi-query(prefill/append) attention"

#276 yzh119 closed 3 months ago
0
feat: support cuda graph for batched multi-query(prefill/append) attention

#275 yzh119 closed 3 months ago
0
hotfix: fix setup.py

#274 yzh119 closed 3 months ago
0
fp8: add calibration scale for decode attention operators

#273 yzh119 closed 3 months ago
2
git: ignore generated directory in documentation

#272 yzh119 closed 3 months ago
0
doc: add some documentation for attention with mask API

#271 yzh119 closed 3 months ago
0
doc: update documentation for mask layout

#270 yzh119 closed 3 months ago
0
3rdparty: add dependency to cutlass and composable kernels

#269 yzh119 closed 3 months ago
0
3rdparty: add mscclpp dependency

#268 yzh119 closed 3 months ago
0
bugfix: avoid potential illegal memory access

#267 yzh119 closed 3 months ago
0
feat: support custom attention mask in prefill/append attention kernels

#266 yzh119 closed 3 months ago
0
bugfix: use `FlagHeads` instead of `SubtractLeft` for cuda 118

#265 yzh119 closed 3 months ago
0
doc: bugfix in kv-layout docs

#264 yzh119 closed 3 months ago
0
doc: update documentation

#263 yzh119 closed 3 months ago
0
[WIP] rafactor: make `gqa_group_size` a function argument instead of template parameter

#262 yzh119 closed 2 months ago
1
build raise "cub::BlockAdjacentDifference<__nv_bool, 1024, 1, 1, 860>" has no member "SubtractLeft"

#261 WanBenLe closed 3 months ago
8
sampling: fused speculative sampling kernels

#259 yzh119 closed 3 months ago
0
[Bug report] BatchPrefillWithPagedKVCachePyTorchWrapper failed to dispatch group_size 3

#258 merrymercy closed 2 months ago
3
[Feature request] Support attention logits cap with tanh

#257 merrymercy closed 2 months ago
5
perf: initial cuda graph support

#256 yzh119 closed 3 months ago
1
bugfix: fix pybind class bindings

#255 yzh119 closed 3 months ago
0
Qwen1.5-32B failed: BatchPrefillWithPagedKVCachePyTorchWrapper failed to dispatch group_size 5

#254 QwertyJack closed 2 months ago
1
perm: use page-locked host memory for auxiliary data structure on CPU

#253 yzh119 closed 3 months ago
0
cmake: backward compatibility for TVM_HOME

#252 yzh119 closed 3 months ago
0
cmake: rename TVM_HOME to TVM_SOURCE_DIR

#251 yzh119 closed 3 months ago
0
Can BatchDecodeWithPaddedKVCache be used in cascade inference?

#250 joey12300 closed 2 days ago
2
CUDA Error: no kernel image is available for execution on the device (209) /tmp/build-via-sdist-nl8se4dx/flashinfer-0.0.4+cu118torch2.2/include/flashinfer/attention/decode.cuh: line 871 at function cudaFuncSetAttribute(kernel, cudaFuncAttributeMaxDynamicSharedMemorySize, smem_size)

#249 lucasjinreal opened 3 months ago
2
Circular import error when importing built-from-source flashinfer

#248 vedantroy opened 3 months ago
1
Fix compile/assert on group_size

#247 Qubitium closed 2 months ago
1
Add group_size 7 and fix compat with Yi 1.5 34b

#246 Qubitium closed 3 months ago
3
multiple definition of `cuda::__3::pipeline...

#245 jpf888 opened 3 months ago
0
Move -Wno-switch-bool argument to cxx from nvcc

#244 mgerstgrasser closed 3 months ago
0
Compilation fails due to "-Wno-switch-bool" nvcc flag

#243 mgerstgrasser closed 3 months ago
0
能否支持Volta/Tesla架构？

#242 alexngng closed 3 weeks ago
2
bugfix: Fix dispatcher in src directory

#241 yzh119 closed 3 months ago
0
bugfix: fix the `generate_dispatch_inc` script

#240 yzh119 closed 3 months ago
0

Previous Next