VikParuchuri / textbook_quality

Generate textbook-quality synthetic LLM pretraining data
MIT License
461 stars 46 forks source link

Is this project a reproduction of the textbook are all you need paper? #11

Closed Wangxiaoxiaoa closed 8 months ago

Wangxiaoxiaoa commented 8 months ago

Is this project a reproduction of the textbook are all you need paper?

VikParuchuri commented 8 months ago

It produces synthetic data that can be used to train a phi replication, or generally pretrain models.