📚Tensor/CUDA Cores, 📖150+ CUDA Kernels, ⚡️⚡️toy-hgemm library with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS 🎉🎉).
GNU General Public License v3.0
1.57k
stars
166
forks
source link
[Blog]图解DeepSpeed-Ulysses&Megatron-LM TP/SP #127
Closed
DefTruth closed 3 weeks ago