Multihead arch with cutlass fused multihead attention

LeelaChessZero / lc0

The rewritten engine, originally for tensorflow. Now all other backends have been ported here.

GNU General Public License v3.0

2.37k stars 523 forks source link

Open almaudoh opened 5 months ago

almaudoh commented 5 months ago

cutlass implementation of fused multihead attention layer giving about 10% speedup on A100.

@todo