facebookresearch / xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.
https://facebookresearch.github.io/xformers/
Other
8.67k stars 619 forks source link

[refactor] Generalization of dual_gemm_silu_identity_mul #1141

Closed warpuv closed 2 weeks ago

warpuv commented 2 weeks ago

Generalization of dual_gemm_silu_identity_mul to use custom activation function.

What does this PR do?

Fixes #1140

Before submitting

PR review

Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

pansershrek commented 2 weeks ago

@danthe3rd, @lw, @zyan0 Can you review this PR please? This update will open pass to implement fused GELUTanh for Gemma models

danthe3rd commented 2 weeks ago

Hi @pansershrek and @warpuv Thanks for opening this PR. In principle I'm happy to accept this line of contributions, however you should be aware that:

cc @tridao

pansershrek commented 2 weeks ago

Hi @danthe3rd ! Thank you for your replay, can you explain in more details your advice about B0 and B1 columns? We don't understand the difference in architecture that well as you :) .