PygmalionAI / data-toolbox

Our data munging code.
GNU Affero General Public License v3.0
34 stars 9 forks source link

Add Dolly dataset and "guess the instruction" task #26

Closed TearGosling closed 1 year ago

TearGosling commented 1 year ago

A small entirely-human-made dataset (of admittedly dubious quality) totaling 15k entries in size. I want to try and implement a "Guess the Instruction" task inspired by this paper to see how that affects model performance, even if it is relatively small compared to the other data we're feeding into Pyg.

TearGosling commented 1 year ago

Tested this out and it works well enough, so I'm just gonna merge.