Oneflow-Inc / libai

LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
https://libai.readthedocs.io
Apache License 2.0
391 stars 55 forks source link

use_fuse_mask_softmax #412

Open xiezipeng-ML opened 2 years ago

xiezipeng-ML commented 2 years ago

测试use_fuse_mask_softmax 的性能增益

oneflow分支:python3 -m pip install --pre oneflow -f https://staging.oneflow.info/branch/release/mt5_opt/cu112

对应的oneflow commit:2d080aa

libai分支:use_fuse_mask_softmax

projects/T5/configs/t5_model_config.py中测量model.cfg.scale_mask_softmax_fusion = Falsemodel.cfg.scale_mask_softmax_fusion = True上的吞吐区别

@ouyangyu @chengtbf @strint

xiezipeng-ML commented 2 years ago

batch size = 4, acc step = 8, amp, open Checkpointing

1n1g use_fuse_mask_softmax = False use_fuse_mask_softmax = True
Throughput total_throughput: 152.35 samples/s total_throughput: 158.00 samples/s
GPU Memory 3145MiB 3335MiB
1n4g use_fuse_mask_softmax = False use_fuse_mask_softmax = True
Throughput total_throughput: 109.33 samples/s total_throughput: 112.39 samples/s
GPU Memory 2445MiB 2545MiB

@chengtbf @strint @ouyangyu