issues
search
HazyResearch
/
ThunderKittens
Tile primitives for speedy kernels
MIT License
1.52k
stars
58
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
256 Kernel
#60
ArjunParthasarathy
opened
4 days ago
0
Danfu09/tk2
#59
DanFu09
closed
1 week ago
1
Refactor complex code into separate files + add old kernel versions
#58
ArjunParthasarathy
opened
3 weeks ago
0
Could you provide a valid mirror?
#57
ziyuhuang123
opened
3 weeks ago
1
cannot find -lcuda: No such file or directory
#56
ziyuhuang123
opened
3 weeks ago
0
Could you provide a gemm kernel?
#55
ziyuhuang123
opened
3 weeks ago
0
h100.cu(97): error: "wait" is ambiguous
#54
ziyuhuang123
opened
3 weeks ago
1
Fix typos.
#53
jasondavies
closed
1 month ago
0
Add complex tile functionality to test TK FFT kernels
#52
ArjunParthasarathy
closed
4 weeks ago
0
Merge 32x32 FFTConv Kernel into TK
#51
ArjunParthasarathy
closed
1 month ago
0
When will ThunderKittens support AMD GPUs, specifically the W7900?
#50
lahmuller
opened
1 month ago
0
Build infra
#49
benjaminfspector
closed
1 month ago
0
Confusing Comment in rt.cuh
#48
KAOZUOI
opened
1 month ago
0
Redo mbar
#47
benjaminfspector
closed
2 months ago
1
Swizzy
#46
benjaminfspector
closed
2 months ago
1
c++20 does not work?
#45
ziyuhuang123
opened
2 months ago
1
Support for global load/store padding
#44
Hprairie
opened
3 months ago
0
Template error
#43
Hprairie
opened
3 months ago
1
Cross-GPU portability
#42
janEbert
opened
3 months ago
0
Is it possible to support non-contiguous input tensor ?
#41
ProHuper
opened
3 months ago
0
Error running make
#40
BurhanUlTayyab
closed
3 months ago
0
Support `softmax_scale` and `dropout` options for fwd_attend_ker_dim ?
#39
ProHuper
closed
3 months ago
0
Support `softmax_scale` and `dropout` options for fwd_attend_ker_dim ?
#38
ProHuper
closed
3 months ago
0
[bug report][4090 attn] cudaCheckError(): too many resources requested for launch
#37
kexve
opened
4 months ago
1
TK kernelize M1 for loop body
#36
LeoXinhaoLee
closed
4 months ago
0
Support for TPUs?
#35
jaanli
opened
4 months ago
0
fix async bug
#34
xiayuqing0622
closed
4 months ago
0
[bug report] h100 attn_causal kernel
#33
xiayuqing0622
opened
4 months ago
3
attn_bias rel-pos support to the FAv2 example
#32
vadimkantorov
closed
4 months ago
1
add suport for a100 atten
#31
MichoChan
closed
4 months ago
0
why there is no zero(attn) before compute q@k.t in h100 example?
#30
xiayuqing0622
closed
4 months ago
2
added tma reductions
#29
Aaryan0404
closed
4 months ago
1
[feat] add simple half gemm example
#28
luliyucoordinate
closed
4 months ago
0
Load with ldmatrix
#27
liyanc
opened
4 months ago
2
Add support for head dimension 128
#26
perklet
opened
4 months ago
4
Two questions
#25
dongrixinyu
closed
4 months ago
1
fix a small typo
#24
lancerts
closed
4 months ago
1
[Feature Request] GEMM benchmarks and FP8 Support
#23
jwfromm
opened
4 months ago
7
unable to reproduce attn_causal speeds
#22
152334H
closed
4 months ago
3
[Question] Supported compute capabilities?
#21
bayley
opened
4 months ago
3
Based gen
#20
simran-arora
closed
4 months ago
0
Causal
#19
Aaryan0404
closed
4 months ago
0
Cutlass parity
#18
benjaminfspector
closed
5 months ago
0
Kvstates
#17
simran-arora
closed
5 months ago
0
Descriptor refactor
#16
benjaminfspector
closed
5 months ago
1
Binding
#15
simran-arora
closed
5 months ago
0
conda setup docs
#14
jordan-benjamin
closed
5 months ago
0
add based upate step
#13
simran-arora
closed
5 months ago
0
Cooperative refactor
#12
benjaminfspector
closed
5 months ago
1
Attention h100
#11
Aaryan0404
closed
6 months ago
0
Next