facebookresearch / xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.
https://facebookresearch.github.io/xformers/
Other
8.41k stars 597 forks source link

Tiny upstream pr #1094

Closed qianfengz closed 2 weeks ago

qianfengz commented 3 weeks ago

This PR provide:

  1. Synchronize to latest composable kernel commit which added inline-asm implementation of fp32 to bf16 RTN conversion. Using inline-asm RTN conversion is able to improve the performance when BF16+RTN is used
  2. Add compiler options for compiling c++ extension on ROCM/HIP, which is able to improve the performance of HIP FMHA BWD on ROCM 6.2
danthe3rd commented 3 weeks ago

Thanks! can you fix the formatting of setup.py tho? (see linter CI)

qianfengz commented 2 weeks ago

Any further layout changing is needed ?

danthe3rd commented 2 weeks ago

Sorry, forgot about that PR :) Let me merge it