flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving
https://flashinfer.ai
Apache License 2.0
1.46k stars 143 forks source link

Fix the alignment of o_frag #608

Closed nandor closed 1 week ago

nandor commented 1 week ago

Since o_frag was not always aligned to a 16-byte boundary, memcpy implemented using 4x float moves was crashing in cuda-gdb when compiled with -G.