issues
search
DefTruth
/
CUDA-Learn-Notes
📚Tensor/CUDA Cores, 📖150+ CUDA Kernels, ⚡️⚡️toy-hgemm library with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS 🎉🎉).
GNU General Public License v3.0
1.57k
stars
166
forks
source link
Update embedding.cu
#133
Closed
TheManWhoIsStupid
closed
2 weeks ago
TheManWhoIsStupid
commented
2 weeks ago
wrong index compute in f32x4 and f16x8
wrong index compute in f32x4 and f16x8