[Project] GPT-NeoX: an open-source framework for training language models with billions of parameters

Project: GPT-NeoX

Codebase and Materials: codebase, training data

Project Lead(s): Sid Black (@sdtblck)

Currently Active Members: Alex Andonian (@alexandonian), Stella Biderman (@StellaAthena), Sid Black (@sdtblck), Preetham Gali (@preethamgali), Shivanshu Purohit (@ShivanshuPurohit)

Elevator Pitch: Massive language models like GPT-3 are incredibly powerful tools for research and industry alike. As they tend to be very expensive to develop, the groups that own them are very hesitant to share them with the public. Our goal is to train a suite of massive language models ranging in size from 1B to 200B parameters and make the pretrained models freely available for anyone to use.

Goal Outputs:

An open source codebase that is capable of training, evaluating, and distilling GPT-3-style language models as large as 200B parameters.
Pretrained model checkpoints of a variety of sizes all the way up to 200B parameters.
Several academic papers based on the results of our work including "Lessons Learned Training a 200B Parameter Language Model on Commodity Hardware" and "Scaling Laws for Distilling Language Models."

Milestones:

[X] Fully parallel training
[ ] Distilling functionality
[x] Eval harness integration
[X] DeepSpeed ZeRO 1 Integration
[ ] ~~DeepSpeed ZeRO 3 Integration:~~ semi-abandoned due to DS's repeated breaking changes
[X] Run time experiments to predict scaling
[x] Train a 1B parameter model
[ ] Train a 10B parameter model
[ ] Train a 50B parameter model
[ ] Train a 100B parameter model

Current Status: Preetham and Stella are implementing distilling functionality Alex and Sid are working on various minor fixes and adding new features Shivanshu is working on the eval harness integration

How to Help: Check out the open issues.

Desired Support: We always need more GPUs

EleutherAI / project-menu

[Project] GPT-NeoX: an open-source framework for training language models with billions of parameters #12