eagle705 / presentation

presentation pdf collection
6 stars 1 forks source link

Training Compute-Optimal Large Language Models #17

Open eagle705 opened 1 year ago

eagle705 commented 1 year ago

Note

Author

Abstract

Introduction

Figure 1 Figure A3
image image

Estimating the optimal parameter/training tokens allocation

3.1. Approach 1: Fix model sizes and vary number of training tokens

3.2. Approach 2: IsoFLOP profiles

image

3.3. Approach 3: Fitting a parametric loss function

3.4. Optimal model scaling

4. Chinchilla

4.1. Model and training details

4.2. Results

image

4.2.1. Lanugage modeling

4.2.2. MMLU

Discussion & Conclusion

image

Appendix

학습셋

image

D.3. Predicted compute optimal frontier for all three methods

image