baichuan-inc / Baichuan-7B

A large-scale 7B pretraining language model developed by BaiChuan-Inc.
https://huggingface.co/baichuan-inc/baichuan-7B
Apache License 2.0
5.67k stars 504 forks source link

[Question] 182 TFLOPS 是model TFLOPS 还是 hardware TFLOPS #91

Open xuanr opened 1 year ago

xuanr commented 1 year ago

Required prerequisites

Questions

你好,请问下README中的 "基于上述的几个优化技术,我们在千卡 A800 显卡上达到了 7B 模型 182 TFLOPS 的吞吐,GPU 峰值算力利用率高达 58.3%。" 182 TFLOPS指的是model TFLOPS 还是 hardware TLOPS https://proceedings.mlsys.org/paper_files/paper/2023/file/e851ca7b43815718fbbac8afb2246bf8-Paper-mlsys2023.pdf "We define the model FLOPs utilization (MFU) and hardware FLOPs utilization (HFU) similar to Chowdhery, et al. [6]. Model FLOPs are the floating point operations required to perform a single forward and backward pass (single iteration) regardless of the implementations and hardware limi- tations. As a result, model FLOPs are hardware and implementation independent and only depend on the underlying model. On the other hand, the hardware FLOPs represent the floating point op- erations that are actually performed on the hardware per iteration. Therefore, if an implementation requires activation recomputation (for example ours), then the hardware FLOPs are going to be larger than model FLOPs. We provide a tight lower bound formula for the model and hardware FLOPs in Appendix A. For our method, the hardware to model FLOPs ratio is approximately 1 + s/6h."

Checklist

xuanr commented 1 year ago

可以给下 182 TFLOPS 的具体算法不~