issues
search
ROCm
/
triton
Development repository for the Triton language and compiler
MIT License
80
stars
22
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Compatible with PyTorch's triton
#507
netw0rkf10w
closed
2 months ago
23
Eliminate ternary if statement
#506
groenenboomj
closed
4 months ago
2
[PYTORCH UT] Inaccuracies in argmax/argmin UT
#505
jataylo
closed
2 months ago
9
[DRAFT] MFMA 64x4 + 4x4 FA
#504
binarman
opened
5 months ago
0
[WIP][DRAFT] LLIR interpreter tests
#503
binarman
opened
5 months ago
0
Unsupported conversion from f8E4M3B11FNUZ to bf16
#502
zstreet87
closed
4 months ago
1
Support tt.trans in stream pipeliner
#501
htyu
closed
4 months ago
1
Revert "[MFMA] Move operand casts to AccelerateMatMul pass"
#500
zhanglx13
closed
5 months ago
0
Autotune matrix_instr_nonkdim
#499
htyu
closed
5 months ago
4
added ir dump parsing script
#498
jtang10
closed
5 months ago
0
Add the FA v2 "varlen" API
#497
vgokhale
closed
5 months ago
0
[MFMA] Remove redundant checks
#496
binarman
closed
5 months ago
0
[GEMM] [Tuning] Add an option to disable warmup compilation
#495
zhanglx13
closed
5 months ago
1
'AssertionError("cannoe reassign constxpr full_range in the loop")` Failing pytorch UTs
#494
jataylo
closed
2 months ago
7
[CI] run AMD backend test on PRs
#493
micmelesse
closed
4 months ago
1
fa decode example fp16/int4kv
#492
scxiao
closed
3 months ago
4
Provide correct warp size to compiler
#491
joviliast
closed
4 months ago
5
Add InReg attribute to function args
#490
zhanglx13
closed
5 months ago
6
cannot import name 'cdiv' from 'triton' (unknown location)
#489
ehartford
closed
2 months ago
2
enable predicate in lds load store
#488
scxiao
closed
3 months ago
0
[MFMA][FRONTEND] Add more options for forced mfma layout sizes
#487
binarman
closed
3 months ago
9
[GEMM] [Tuning] Option to try different initialization strategies
#486
vgokhale
closed
5 months ago
0
Fix FA tutorial
#485
zhanglx13
closed
5 months ago
1
support FA to configure multiple waves in a workgroup
#484
scxiao
opened
5 months ago
0
AMD specific scheduling pass for TTGIR instructions
#483
oplavsic
closed
3 months ago
0
Fix a bug in fastPath condition
#482
zhanglx13
closed
5 months ago
0
Wmma convertions
#481
joviliast
closed
3 months ago
1
improve chain dot checking
#480
scxiao
closed
5 months ago
0
[MFMA] Cleanup mfma pipeline
#479
binarman
closed
5 months ago
0
refine tolerance in checking GEMM correctness
#478
scxiao
closed
5 months ago
0
[MFMA] Move operand casts to AccelerateMatMul pass
#477
binarman
closed
5 months ago
5
Add option for larger LDS vecSize
#476
zhanglx13
closed
4 months ago
1
[FA-qk-fp8] Add fp8 FA to 06-fused-attention-fwd-transV.py
#475
zhanglx13
closed
5 months ago
0
Enable swizzling SMEM for transposed dot operand
#474
htyu
closed
5 months ago
0
fix warp size in reduce op for main branch
#473
scxiao
closed
4 months ago
1
[MFMA][Test] Add scripts generating mfma related lit tests
#472
binarman
closed
3 months ago
4
fix warp size in lowering reduce op
#471
scxiao
closed
5 months ago
0
[Tuning] Add `matrix_instr_nonkdim` in the tuning space
#470
zhanglx13
closed
5 months ago
0
[MFMA] Support 64x4 and 4x64 tile size
#469
binarman
closed
5 months ago
3
Add shortcut for creation fp16, bfp16
#468
joviliast
closed
5 months ago
1
Revert "Add autotuning for FA (#459)"
#467
vgokhale
closed
5 months ago
0
Fix vecSize for fp8 and int8 on MI300
#466
zhanglx13
closed
5 months ago
0
remove git modules for tree sitter
#465
jayfurmanek
closed
5 months ago
0
[AMD backend] Treat metadata as a tuple instead of a dict
#464
zhanglx13
closed
5 months ago
0
[GEMM][Tutorial] Refine test_correctness
#463
zhanglx13
closed
5 months ago
0
support configure multiple waves in flash-attention
#462
scxiao
closed
5 months ago
0
Flash Attention Triton for Mi50s
#461
ThePerfectComputer
closed
5 months ago
3
Merage aot features
#460
groenenboomj
closed
5 months ago
2
Add autotuning for FA
#459
vgokhale
closed
5 months ago
0
Dockerfile and test
#458
micmelesse
closed
5 months ago
0
Previous
Next