Closed yaox12 closed 12 months ago
This PR fuses sin/cos calculation of freqs and the data type conversion into the fused RoPE kernel, which reduces 4 tiny element-wise kernels.
freqs
@crcrpar Ready for merging. Thanks.
This PR fuses sin/cos calculation of
freqs
and the data type conversion into the fused RoPE kernel, which reduces 4 tiny element-wise kernels.