A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
8.43k
stars
1.4k
forks
source link
A fused `apply_rotary_pos_emb` implementation for Megatron-Core #1746
Closed
yaox12 closed 1 year ago
This is a fused
apply_rotary_pos_emb
implementation for Megatron-Core.In my preliminary benchmark, it gives 2x - 4x speedup over the unfused version.
batch_size=2
andhead_num=64
are fixed.