issues
search
ROCm
/
aotriton
Ahead of Time (AOT) Triton Math Library
MIT License
42
stars
15
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Misc changes and performance tuning for 0.8b release
#57
xinyazhang
closed
1 week ago
0
[Queued PR] Port fixes from 0.7.2b
#56
xinyazhang
closed
1 week ago
1
Restore the support of causal=True and seqlen_q != seqlen_k
#55
xinyazhang
closed
1 week ago
1
[Documentation]: large bf16 inputs leads to nan
#54
xinyazhang
opened
3 weeks ago
0
Add versioning support in multiple levels.
#53
xinyazhang
closed
2 weeks ago
0
libaotriton_v2.so: Fix 'argument list too long" error
#52
prarit
closed
1 month ago
1
Add docker based package builder and switch to system compiler
#51
xinyazhang
closed
2 weeks ago
8
Kernel Storage V2
#50
xinyazhang
closed
2 weeks ago
7
GQA Support
#49
xinyazhang
closed
3 weeks ago
0
Code Clean Up
#48
xinyazhang
closed
1 month ago
0
[Documentation]: Forward kernel returns NaN when inputs are irregular (including causal) and sm_scale is 0
#47
xinyazhang
opened
1 month ago
1
Merge improvements of 0.7.1b release into main
#46
xinyazhang
closed
1 month ago
1
FA Kernel Update for Accuracy and Performance
#45
xinyazhang
closed
2 months ago
0
Ignore colon suffixes in gcnArchName
#44
xinyazhang
closed
2 months ago
0
Fix numerical error by applying qk_scale at inner loop instead of outer loop
#43
xinyazhang
closed
2 months ago
2
Add cmake option AOTRITON_NAME_SUFFIX to resolve name conflicts
#42
xinyazhang
closed
2 months ago
1
Add PyTorch compatibility matrix to README.md
#41
xinyazhang
closed
2 months ago
0
Support hipGraph usage in PyTorch
#40
xinyazhang
closed
3 months ago
3
Improve Backward Performance and Navi31 Support
#39
xinyazhang
closed
3 months ago
35
[Documentation]: The overall tuning idea of aotriton
#38
hubotao1
opened
4 months ago
6
[Perf] Is it possible that the kernels wrapped with AOT have the similar performance comparing with the original ones?
#37
xinji1
opened
4 months ago
1
Switch to upstream Triton compiler, and related changes
#36
xinyazhang
closed
4 months ago
3
[Issue]: build fail: ModuleNotFoundError: No module named 'triton._C.libtriton.triton'
#35
minzhezhou
closed
5 months ago
1
[Issues]: The Gap between AOT and JIT Triton on Flash Attention kernel
#34
jinsong-mao
opened
5 months ago
0
Install .so if AOTRITON_NO_SHARED is OFF
#33
jithunnair-amd
closed
5 months ago
0
[Issue]: failed to run the tune_flash.py
#32
jinsong-mao
opened
5 months ago
13
Add varlen support to AOTriton's Flash Attention
#31
xinyazhang
closed
5 months ago
3
How to run benchmark tests[Issue]:
#30
jinsong-mao
closed
5 months ago
8
Refactor the build system
#29
xinyazhang
closed
6 months ago
1
[Issue]: Unable to build, Unknown CMake command "pybind11_add_module"
#28
RandUser123sa
closed
1 week ago
10
[Feature]: CDNA1 Support
#27
IMbackK
opened
6 months ago
0
Adding mutex.h for TE pytorch extension compilation
#26
wangye805
closed
6 months ago
0
the atten_bwd_dk_dv is bad in performance on mi300x
#25
jinsong-mao
closed
5 months ago
1
[mGPU] Run hipModuleLoadDataEx for each GPU device.
#24
xinyazhang
closed
6 months ago
0
Resolve cmake conflicts when adding aotriton into TE via add_subdirectory
#23
wangye805
closed
6 months ago
0
Add FP32 and Bias to fulfill the functionalities required by `torch.nn.attention.SDPBackend.EFFICIENT_ATTENTION`
#22
xinyazhang
closed
7 months ago
0
[Question] Autotune kernel based on `third_party/triton`
#21
xinji1
closed
7 months ago
2
<vector> is required regardless of AOTRITON_USE_ZSTD
#20
xinyazhang
closed
7 months ago
1
Add new triton kernel debug_fill_dropout_rng
#19
xinyazhang
closed
7 months ago
0
[Issue]: Pytorch fails to compile locally due to aotriton failing to build the hsaco objects
#18
Zakhrov
closed
1 month ago
9
[Feature]: Fix the mandatory boundary_check when loading bias tensor
#17
xinyazhang
opened
7 months ago
0
[Feature]: Memory Efficient Flash Attention for gfx1100 (7900xtx)
#16
supernovae
closed
3 months ago
31
[Feature]: C++ version `mk_aotensor`
#15
xinji1
closed
7 months ago
2
Add matrix bias to forward/backward kernel
#14
xinyazhang
closed
7 months ago
1
Switch Tuning database to SQLite3 for Incremental Tuning
#13
xinyazhang
closed
7 months ago
0
[Documentation]: Shall we modify the configurations in `v2python` for the other kernels?
#12
xinji1
opened
8 months ago
5
[Issue]: Release Tags
#11
trixirt
closed
8 months ago
2
Fix the performance regression introduced during support of irregular shapes.
#10
xinyazhang
closed
8 months ago
0
Update README.md
#9
groenenboomj
closed
8 months ago
1
Add strides to all input tensors
#8
xinyazhang
closed
9 months ago
0
Next