LeelaChessZero / lc0

The rewritten engine, originally for tensorflow. Now all other backends have been ported here.
GNU General Public License v3.0
2.37k stars 523 forks source link

Multihead arch with cutlass fused multihead attention #1976

Open almaudoh opened 5 months ago

almaudoh commented 5 months ago

cutlass implementation of fused multihead attention layer giving about 10% speedup on A100.

@todo