VikParuchuri / textbook_quality

Generate textbook-quality synthetic LLM pretraining data
MIT License
488 stars 50 forks source link

Is this project a reproduction of the textbook are all you need paper? #11

Closed Wangxiaoxiaoa closed 1 year ago

Wangxiaoxiaoa commented 1 year ago

Is this project a reproduction of the textbook are all you need paper?

VikParuchuri commented 1 year ago

It produces synthetic data that can be used to train a phi replication, or generally pretrain models.