issues
search
DefTruth
/
CUDA-Learn-Notes
📚Tensor/CUDA Cores, 📖150+ CUDA Kernels, ⚡️⚡️toy-hgemm library with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS 🎉🎉).
GNU General Public License v3.0
1.56k
stars
165
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[FlashAttention] Refactor toy-flash-attn codes part-1
#156
DefTruth
closed
2 days ago
0
[Softmax] Update Online Softmax bindings
#155
DefTruth
closed
4 days ago
0
[HGEMM] CuTe HGEMM debug Makefile target
#154
DefTruth
closed
4 days ago
0
你好,关于online safe softmax的速度,貌似并没有明显提升
#153
lzcchl
closed
4 days ago
2
[HGEMM] Update toy-hgemm library 0.1.0
#152
DefTruth
closed
5 days ago
0
关于softmax中的实现的理解,求大佬解惑
#151
lzcchl
closed
1 week ago
2
[HGEMM] Update toy-hgemm library 0.1.0
#150
DefTruth
closed
1 week ago
0
[HGEMM] Update toy-hgemm library 0.1.0
#149
DefTruth
closed
1 week ago
0
[HGEMM] Update RTX 3080 Laptop perf
#148
DefTruth
closed
1 week ago
0
Include SageAttention Kernel
#147
jason-huang03
opened
1 week ago
3
[HGEMM] Release toy-hgemm library 0.1.0
#146
DefTruth
closed
1 week ago
0
[HGEMM] Release toy-hgemm library 0.1.0
#145
DefTruth
closed
1 week ago
0
[HGEMM] manually init/destroy cublas handle
#144
DefTruth
closed
1 week ago
0
[HGEMM] Add show_memory option to bench
#143
DefTruth
closed
1 week ago
0
[HGEMM] Add gc.collect to HGEMM bench script
#142
DefTruth
closed
1 week ago
0
[HGEMM] clear tensor cache avoid OOM
#141
DefTruth
closed
1 week ago
0
[HGEMM] CuTe HGEMM with Thread Block Swizzle
#140
DefTruth
closed
1 week ago
0
[HGEMM] Add MMA HGEMM NN C++ benchmark
#139
DefTruth
closed
1 week ago
0
[HGEMM] fix cublas hgemm handle error
#138
DefTruth
closed
1 week ago
0
[HGEMM] Update HGEMM L20/4090 Bench
#137
DefTruth
closed
1 week ago
0
[HGEMM] refactor HGEMM cpp benchmark
#136
DefTruth
closed
1 week ago
0
[HGEMM] trans mat b from row major -> col major
#135
DefTruth
closed
2 weeks ago
0
[HGEMM] Add CuTe HGEMM with SMEM Swizzle
#134
DefTruth
closed
2 weeks ago
0
Update embedding.cu
#133
TheManWhoIsStupid
closed
2 weeks ago
0
[HGEMM] Add large MNK block swizzle policy
#132
DefTruth
closed
2 weeks ago
0
Bump up to v2.6
#131
DefTruth
closed
2 weeks ago
0
[README] Update README.md
#130
DefTruth
closed
2 weeks ago
0
[README] Update README
#129
DefTruth
closed
2 weeks ago
0
[README] Add contents lists
#128
DefTruth
closed
3 weeks ago
0
[Blog]图解DeepSpeed-Ulysses&Megatron-LM TP/SP
#127
DefTruth
closed
3 weeks ago
0
[HGEMM] Update NVIDIA L20/4090 Perf plots
#126
DefTruth
closed
3 weeks ago
0
Bump up to v2.5
#125
DefTruth
closed
4 weeks ago
0
[HGEMM] Add HGEMM L20/4090 benchmark figures
#124
DefTruth
closed
4 weeks ago
0
[PERF] Update HGEMM benchmark scripts
#123
DefTruth
closed
4 weeks ago
7
[HGEMM] Add NVIDIA RTX 3090 Laptop perf plot
#122
DefTruth
closed
1 month ago
0
[HGEMM] Add plot tflops function
#121
DefTruth
closed
1 month ago
0
[HGEMM] Update HGEMM README.md
#120
DefTruth
closed
1 month ago
0
[HGEMM] Add NVIDIA RTX 4090 benchmark
#119
DefTruth
closed
1 month ago
0
[README] Update HGEMM/SGEMM Supported Matrix
#118
DefTruth
closed
1 month ago
0
[HGEMM] Update HGEMM/SGEMM Supported Matrix
#117
DefTruth
closed
1 month ago
0
[HGEMM] Update HGEMM Supported Matrix
#116
DefTruth
closed
1 month ago
0
Update README.md
#115
DefTruth
closed
1 month ago
0
[Docs] Update HGEMM/SGEMM Supported Matrix
#114
DefTruth
closed
1 month ago
0
[README] Update HGEMM/SGEMM Supported matrix
#113
DefTruth
closed
1 month ago
0
[HGEMM] Update HGEMM/SGEMM Supported Matrix
#112
DefTruth
closed
1 month ago
0
[HGEMM] Add M=N=K option for benchmark
#111
DefTruth
closed
1 month ago
0
[HGEMM][Docs] Add HGEMM Supported Matrix
#110
DefTruth
closed
1 month ago
0
[HGEMM] Update HGEMM MMA/WMMA Usage
#109
DefTruth
closed
1 month ago
0
[HGEMM] Try reduce registers usage
#108
DefTruth
closed
1 month ago
0
[HGEMM] add -Xptxas -v compile flag
#107
DefTruth
closed
1 month ago
0
Next