facebookresearch / xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.
https://facebookresearch.github.io/xformers/
Other
8.66k stars 614 forks source link

Update for complete functional support of ck fmha forward/backward #1061

Closed qianfengz closed 4 months ago

qianfengz commented 5 months ago

Based on the existing available ck fmha functions on the main branch, this PR add the followings:

  1. Update to ck fmha forward for better performance
  2. Enable ck fmha forward to use dropout
  3. Add standalone kernel to generate uniform random numbers, used for drop-out verification
  4. Add complete ck fmha backward implementation
  5. Position the composable_kernel_tiled submodule to ck develop branch

To test/verify

#> pytest tests/test_mem_eff_attention.py::test_forward -k  "ckF"
#> pytest tests/test_mem_eff_attention.py::test_backward -k "ckB"
#> pytest tests/test_mem_eff_attention.py::test_dropout
#> pytest tests/test_mem_eff_attention.py::test_dropout_backward_ck
#> pytest tests/test_mem_eff_attention.py::test_decoder 
qianfengz commented 5 months ago

Are you able to see the lint failures? There are a few to fix here.

Let me check.

qianfengz commented 5 months ago

@bottler There is a checking failure about the copyright for file xformers/csrc/attention/hip_fmha/generate_instances.py. Is any way to fix the issue? If not, I can just remove the script, cause it is only used for convenience by the developer to create the instance cpp files

bottler commented 5 months ago

@bottler There is a checking failure about the copyright for file xformers/csrc/attention/hip_fmha/generate_instances.py. Is any way to fix the issue? If not, I can just remove the script, cause it is only used for convenience by the developer to create the instance cpp files

Can you add # noqa: C801 to the first line of the file?

qianfengz commented 5 months ago

@bottler There is a checking failure about the copyright for file xformers/csrc/attention/hip_fmha/generate_instances.py. Is any way to fix the issue? If not, I can just remove the script, cause it is only used for convenience by the developer to create the instance cpp files

Can you add # noqa: C801 to the first line of the file?

Added.

bottler commented 5 months ago

I think dispatch.py has had a bad merge with recent changes on main? You are losing unpadded_lse logic from _dispatch_bw.

qianfengz commented 5 months ago

I think dispatch.py has had a bad merge with recent changes on main? You are losing unpadded_lse logic from _dispatch_bw.

Fixed, thank you for pointing out

bottler commented 4 months ago

The last test failures are not caused by this PR. Black is needed on test_mem_eff_attention.py and generate_instances.py and that's all.