PygmalionAI / data-toolbox

Our data munging code.
GNU Affero General Public License v3.0
34 stars 9 forks source link

Clean up ShareGPT dataset #16

Closed TearGosling closed 1 year ago

TearGosling commented 1 year ago

Cleaning ShareGPT data. Perhaps this should be generalized in another file, so that we can apply this to more datasets?

0x000011b commented 1 year ago

Closing this PR since it's stale - we ended up implementing this in a different way so it could be reused across tasks.