We introduce phi-1, a new large language model for code, with significantlysmaller size than competing models: phi-1 is a Transformer-based model with1.3B parameters, trained for 4 days on 8 A100s, using a selection of ``textbookquality" data from the web (6B tokens) and synthetically generated textbooksand exercises with GPT-3.5 (1B tokens). Despite this small scale, phi-1 attainspass@1 accuracy 50.6% on HumanEval and 55.5% on MBPP. It also displayssurprising emergent properties compared to phi-1-base, our model before ourfinetuning stage on a dataset of coding exercises, and phi-1-small, a smallermodel with 350M parameters trained with the same pipeline as phi-1 that stillachieves 45% on HumanEval.
URL
Affiliations
Abstract
Translation (by gpt-3.5-turbo)
Summary (by gpt-3.5-turbo)