PygmalionAI / data-toolbox

Our data munging code.
GNU Affero General Public License v3.0
34 stars 9 forks source link

Implement GPT4All dataset #15

Closed TearGosling closed 1 year ago

TearGosling commented 1 year ago

Simple enough. Framing this slightly differently - rather than tasking the model to merely follow instructions, I sometimes put it in a "question-answer" mode instead, since much of GPT4All's dataset is about answering questions. Admittedly, not every entry in the dataset is a question, but I'm sure some overlap between "follow instructions" and "generate answers to questions" is fine. I can remove said prompts if needed.

0x000011b commented 1 year ago

Closing this PR because it's stale - ended up incorporating these changes within another commit instead of merging here