NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
8.17k stars 1.35k forks source link

Fused RoPE for `thd` format #1756

Closed yaox12 closed 5 months ago

yaox12 commented 7 months ago

This is for the packed sequences feature in NeMo.

yaox12 commented 7 months ago

@crcrpar Can I get your review?