Open xiezipeng-ML opened 2 years ago
1n1g | use_fuse_mask_softmax = False | use_fuse_mask_softmax = True |
---|---|---|
Throughput | total_throughput: 152.35 samples/s | total_throughput: 158.00 samples/s |
GPU Memory | 3145MiB | 3335MiB |
1n4g | use_fuse_mask_softmax = False | use_fuse_mask_softmax = True |
---|---|---|
Throughput | total_throughput: 109.33 samples/s | total_throughput: 112.39 samples/s |
GPU Memory | 2445MiB | 2545MiB |
@chengtbf @strint @ouyangyu
测试use_fuse_mask_softmax 的性能增益
oneflow分支:python3 -m pip install --pre oneflow -f https://staging.oneflow.info/branch/release/mt5_opt/cu112
对应的oneflow commit:2d080aa
libai分支:use_fuse_mask_softmax
在
projects/T5/configs/t5_model_config.py
中测量model.cfg.scale_mask_softmax_fusion = False
和model.cfg.scale_mask_softmax_fusion = True
上的吞吐区别@ouyangyu @chengtbf @strint