VikParuchuri / textbook_quality

Generate textbook-quality synthetic LLM pretraining data
MIT License
461 stars 46 forks source link

custom documents #1

Closed dangfutures closed 8 months ago

dangfutures commented 9 months ago

Can this work with custom documents not just urls??

VikParuchuri commented 9 months ago

For retrieval? You can write a custom retrieval backend to retrieve from documents, like pdfs or txt files. The current retrieval services are in services/adapators. I may take a look into writing this myself in the future, but it doesn't exist now. (although it wouldn't be very difficult to add)

VikParuchuri commented 8 months ago

I just pushed an experimental feature to the dev branch that enables using a custom search service for retrieval. You basically send it a query, and it returns a match. Very flexible in terms of how you host/include documents.