DefTruth / CUDA-Learn-Notes

📚Tensor/CUDA Cores, 📖150+ CUDA Kernels, ⚡️⚡️toy-hgemm library with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS 🎉🎉).
GNU General Public License v3.0
1.57k stars 166 forks source link

Update embedding.cu #133

Closed TheManWhoIsStupid closed 2 weeks ago

TheManWhoIsStupid commented 2 weeks ago

wrong index compute in f32x4 and f16x8