We present Llemma, a large language model for mathematics. We continuepretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, webdata containing mathematics, and mathematical code, yielding Llemma. On theMATH benchmark Llemma outperforms all known open base models, as well as theunreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma iscapable of tool use and formal theorem proving without any further finetuning.We openly release all artifacts, including 7 billion and 34 billion parametermodels, the Proof-Pile-2, and code to replicate our experiments.
URL
Affiliations
Abstract
Translation (by gpt-3.5-turbo)
Summary (by gpt-3.5-turbo)