PygmalionAI / data-toolbox

Our data munging code.
GNU Affero General Public License v3.0
34 stars 9 forks source link

Add enjim dataset(s) #9

Open lloorree opened 1 year ago

lloorree commented 1 year ago

To run/set up:

To sanity-check the output (basically checks for bizarrely-formatted garbage):

grep -E "<[^* '\n3A-Z=<\.]+>|\[[^A-Z* 0-9r\.]{0,10}\]|&[^ \n]{1,10};|:[^ \n0-9]{1,10}:" rev-020c8d0-args-d93e21a.jsonl -c