HomebrewNLP / Olmax

HomebrewNLP in JAX flavour for maintable TPU-Training
BSD 2-Clause "Simplified" License
45 stars 6 forks source link

Language-Model Evaluation #21

Open ClashLuke opened 2 years ago

ClashLuke commented 2 years ago

At the moment, we only have language-modelling loss to go by when experimenting with different architectures. Unfortunately, many methods, such as extra-gradient methods, different loss functions, different tokenisers or even different datasets, will change these loss values dramatically, making comparison almost impossible. We would gain certainty by integrating a dedicated evaluation pipeline such as EleutherAI's eval-harness that one model is better than the other and allow us to compare ourselves with existing models such as GPT-J and GPT-3.