Oneflow-Inc / libai

LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
https://libai.readthedocs.io
Apache License 2.0
391 stars 55 forks source link

Use fuse multi head att #417

Open xiezipeng-ML opened 2 years ago

xiezipeng-ML commented 2 years ago

batch size = 4, acc step = 8, amp, open Checkpointing

1n1g use_fuse_multi_head_att = False use_fuse_multi_head_att = True
Throughput total_throughput: 151.70 samples/s total_throughput: 155.41 samples/s
GPU Memory 3147MiB 3129MiB

encoderdecoder中的self_attcross_att中都使用了fuse_multihead_att. 在28号上简单测了一下,带来的提升有限,应该是transpose的使用次数太多,我下个commit准备把if,else直接取消,默认使用fuse_multihead_att来测一下.

@chengtbf @strint @ouyangyu @CPFLAME

xiezipeng-ML commented 2 years ago

@chengtbf @CPFLAME @strint @ouyangyu